TWI765339B

TWI765339B - Stereoscopic Image Recognition and Matching System

Info

Publication number: TWI765339B
Application number: TW109130809A
Authority: TW
Inventors: 許陳鑑; 黃而旭; 郭建宏; 簡江恆
Original assignee: 國立臺灣師範大學
Priority date: 2020-09-08
Filing date: 2020-09-08
Publication date: 2022-05-21
Also published as: TW202211681A

Abstract

一種立體影像辨識及匹配系統，其包括: 一第一SIFT模組，其輸入端為一左眼視覺影像，用以進行一左眼的特徵偵測與描述後輸出一左眼影像特徵點；一第二SIFT模組，其輸入端為一右眼視覺影像，用以進行一右眼的特徵偵測與描述後輸出一右眼影像特徵點；一座標計算模組，耦接至該左眼視覺影像特徵點及右眼視覺影像特徵點，用以計算及輸出該左眼影像特徵點及右眼影像特徵點的影像座標；以及一立體功能匹配模組，分別耦接至該第一SIFT模組、座標計算模組及第二SIFT模組，根據該左眼影像特徵點、右眼影像特徵點及其影像座標進行匹配後輸出。A stereoscopic image recognition and matching system, comprising: a first SIFT module whose input terminal is a left-eye visual image, for performing feature detection and description of a left-eye and outputting a left-eye image feature point; a The second SIFT module, whose input end is a right-eye visual image, is used to detect and describe a right-eye feature and output a right-eye image feature point; a coordinate calculation module is coupled to the left-eye visual image an image feature point and a right-eye visual image feature point for calculating and outputting the image coordinates of the left-eye image feature point and the right-eye image feature point; and a stereo function matching module, respectively coupled to the first SIFT module , a coordinate calculation module and a second SIFT module, which are matched and outputted according to the left-eye image feature point, the right-eye image feature point and their image coordinates.

Description

Stereoscopic Image Recognition and Matching System

本發明是有關於一種立體影像辨識及匹配系統，尤指一種將SIFT影像辨識演算法實現於FPGA上之影像辨識系統。The present invention relates to a stereo image recognition and matching system, especially an image recognition system that implements a SIFT image recognition algorithm on an FPGA.

近年來由於視覺感測器的進步以及影像技術的日漸成熟，影像辨識已經成為電腦視覺領域不可或缺的一環，其廣泛應用於軍事、工業、醫學領域等，如影像縫合(image stitching)、物體辨識(object recognition)、機器人地圖感知與導航(robotic mapping and navigation)、3D模型建立(3D modeling)、手勢辨識(gesture recognition)以及影像追蹤和動作比對(video tracking and match moving)等。In recent years, due to the advancement of visual sensors and the maturation of imaging technology, image recognition has become an indispensable part of the field of computer vision, which is widely used in military, industrial, medical fields, etc., such as image stitching, object Object recognition, robotic mapping and navigation, 3D modeling, gesture recognition, video tracking and match moving, etc.

影像辨識主要將擷取到之影像進行特徵偵測，近十年來有許多影像特徵辨識演算法被提出，而其中最為知名的是David G. Lowe於1999年電腦視覺會議中提出之尺度特徵不變性轉換(Scale-invariant feature transform , SIFT) SIFT演算法主要是在影像上偵測特徵點，再賦予每個特徵點不同之高維度向量描述，如此一來，影像之間即可進行匹配，而相似的兩特徵向量點則會被比對出來，值得一提的是，SIFT演算法有將每個特徵點之方向考慮進去，所以也成功解決Harris角點偵測非rotation-invariant的問題，雖然SIFT在尺度以及視角旋轉改變下可以得到非常好的匹配結果，不過此演算法的缺點即是運算量非常龐大，導致整體之運算非常耗時，而無法達到即時運算之效果。Image recognition mainly performs feature detection on the captured images. In the past ten years, many image feature recognition algorithms have been proposed, and the most famous one is the scale feature invariance proposed by David G. Lowe at the 1999 Computer Vision Conference. Scale-invariant feature transform (SIFT) The SIFT algorithm mainly detects feature points on the image, and then assigns different high-dimensional vector descriptions to each feature point. In this way, images can be matched and similar The two eigenvector points will be compared. It is worth mentioning that the SIFT algorithm takes the direction of each feature point into account, so it also successfully solves the problem of Harris corner detection non-rotation-invariant, although SIFT Very good matching results can be obtained under the change of scale and viewing angle rotation. However, the disadvantage of this algorithm is that the amount of calculation is very large, which makes the overall calculation very time-consuming and cannot achieve the effect of real-time calculation.

習知專利前案，例如中華民國TW201142718專利「用於在均勻及非均勻照明變化中改善特徵偵測的尺度空間正規化技術」，係一種關於用於改善影像辨識系統之效能效率的方法及技術。其特徵方法是：包含：藉由獲取一影像之兩個不同經平滑版本之間的差而產生一尺度空間影像差；藉由將該尺度空間影像差除以該影像之一第三經平滑版本而產生一經正規化之尺度空間影像差，其中該影像之該第三經平滑版本係與該影像之該兩個不同經平滑版本中之最平滑者一樣平滑或比該最平滑者平滑；及使用該經正規化之尺度空間影像差以偵測該影像之一或多個特徵。唯上述之專利前案，未將每個特徵點之方向考慮進去，致使在視角旋轉改變下無法獲得好的匹配結果。Prior known patents, such as the ROC TW201142718 Patent "Scale Space Normalization Technique for Improved Feature Detection in Uniform and Non-Uniform Illumination Variations", relates to a method and technique for improving the performance efficiency of an image recognition system . The method of characterizing includes: generating a scale space image difference by obtaining the difference between two different smoothed versions of an image; by dividing the scale space image difference by a third smoothed version of the image generating a normalized scale-space image difference, wherein the third smoothed version of the image is as smooth as or smoother than the smoothest of the two different smoothed versions of the image; and using The normalized scale space image difference is used to detect one or more features of the image. Only in the aforementioned patent case, the orientation of each feature point is not taken into consideration, so that a good matching result cannot be obtained under the rotation of the viewing angle.

近年來有一些研究將SIFT演算法實現於FPGA處理平台上，主要透過平行處理之概念來加快運算時間，如2008年Vanderlei Bonato提出以軟硬體協同設計的概念，將SIFT部分演算法於FPGA上使用硬體電路加速實現，Jianhui Wang也於2014年提出一種基於嵌入式系統特徵點偵測與匹配的架構，其結果顯示已經可以達到每秒處理60張影像，Jie Jiang也提出以FPGA全硬體架構實現SIFT偵測以及匹配演算法。In recent years, some studies have implemented the SIFT algorithm on the FPGA processing platform, mainly through the concept of parallel processing to speed up the computing time. For example, in 2008, Vanderlei Bonato proposed the concept of software and hardware co-design, and part of the SIFT algorithm was implemented on the FPGA. Using hardware circuit acceleration, Jianhui Wang also proposed an architecture based on embedded system feature point detection and matching in 2014. The results show that it can process 60 images per second. Jie Jiang also proposed to use FPGA full hardware The architecture implements SIFT detection and matching algorithms.

回顧近五年的硬體實現SIFT演算法，可以發現處理速度及硬體消耗過多。過去Vourvoulakis [請參見J. Vourvoulakis, J. Kalomiros and J. Lygouras, “A complete processor for SIFT feature matching in video sequences,”2017 9th IEEE International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications (IDAACS) , Bucharest, RO, pp. 95-100, 2017]提出了一種基於FPGA的管線式架構。在640×480的影像，最高可以處理70 fps。然而硬體是用量過大，會限制未來在FPGA上的擴充應用。Yum [請參見J. Yum, C. Lee, J. Kim and H. Lee, “A Novel Hardware Architecture with Reduced Internal Memory for Real-Time Extraction of SIFT in an HD Video,” inIEEE Transactions on Circuits and Systems for Video Technology , vol. 26, no. 10, pp. 1943-1954, Oct. 2016]提出的架構是使用外部記憶體，減少內部暫存器使用量。由於外部記憶體會有頻寬(Bandwidth)問題，提出了減少頻寬的方法。並且可以處裡36.85fps的1280×720影像。Yum [請參見J. Yum, C. Lee, J. Park, J. Kim and H. Lee, “A Hardware Architecture for the Affine-Invariant Extension of SIFT,” inIEEE Transactions on Circuits and Systems for Video Technology , vol. 28, no. 11, pp. 3251-3261, Nov. 2018]提出了光柵掃描(Raster scan)的方式並搭配外部記憶體，修改仿射變換(Affine transform)，減少暫存器存取的時間。最終，仿射變換模組提升了325%的產出量(Throughput)。在影像為640×480有著20fps的處理速度。然而Yum [請參見J. Yum, C. Lee, J. Kim and H. Lee, “A Novel Hardware Architecture with Reduced Internal Memory for Real-Time Extraction of SIFT in an HD Video,” inIEEE Transactions on Circuits and Systems for Video Technology , vol. 26, no. 10, pp. 1943-1954, Oct. 2016]與Yum [請參見J. Yum, C. Lee, J. Park, J. Kim and H. Lee, “A Hardware Architecture for the Affine-Invariant Extension of SIFT,” inIEEE Transactions on Circuits and Systems for Video Technology , vol. 28, no. 11, pp. 3251-3261, Nov. 2018]的速度過低。Acharya [請參見K. A. Acharya, R. V. Babu, and S. S. Vadhiyar, “A real-time implementation of SIFT using GPU,”Journal of Real-Time Image Processing. vol. 14, no. 2, pp. 267-277, 2018]用運GUP來實現SIFT，對於640×480的影像有55fps的處理速度。並且已成功應用於目標偵測、目標追蹤。由於SIFT運算量大，透過優化與GPU來提升速度，速度比未使用GPU提升了12.2%。若將此系統實現於硬體中，提升的速度不僅只有12.2%。Li [請參見S. Li, W. Wang, W. Pan, C. J. Hsu and C. Lu, “FPGA-Based Hardware Design for Scale-Invariant Feature Transform,” inIEEE Access , vol. 6, pp. 43850-43864, 2018]提出了基於FPGA的SIFT架構。以獨立的方式，透過軟體模擬的高斯核與原始影像進行卷積運算，建立出不同的高斯影像。並且提出了一套利用伴隨矩陣來實現特徵偵測，取代除法器。來減少硬體使用量。在產生特徵描述子時，使用了CORDIC演算法來計算像素的方向及梯度。並可以處裡150fps的640×480影像。但在硬體花費過高，且並硬體無特徵匹配架構，在匹配時還是須經過軟體。加上此篇架構並無資料有效訊號，在與外部記憶體溝通時會產生不連續的資料傳輸，這會使系統運算錯誤，降低系統運行於FPGA上的穩定度。Looking back at the hardware implementation of the SIFT algorithm in the past five years, it can be found that the processing speed and hardware consumption are excessive. Past Vourvoulakis [see J. Vourvoulakis, J. Kalomiros and J. Lygouras, “A complete processor for SIFT feature matching in video sequences,” 2017 9th IEEE International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications (IDAACS) , Bucharest, RO, pp. 95-100, 2017] proposed an FPGA-based pipelined architecture. In 640×480 images, it can handle up to 70 fps. However, the amount of hardware is too large, which will limit future expansion applications on FPGAs. Yum [See J. Yum, C. Lee, J. Kim and H. Lee, “A Novel Hardware Architecture with Reduced Internal Memory for Real-Time Extraction of SIFT in an HD Video,” in IEEE Transactions on Circuits and Systems for Video Technology , vol. 26, no. 10, pp. 1943-1954, Oct. 2016] proposes an architecture that uses external memory to reduce internal scratchpad usage. Since the external memory has a bandwidth problem, a method for reducing the bandwidth is proposed. And it can handle 1280×720 images at 36.85fps. Yum [See J. Yum, C. Lee, J. Park, J. Kim and H. Lee, “A Hardware Architecture for the Affine-Invariant Extension of SIFT,” in IEEE Transactions on Circuits and Systems for Video Technology , vol . 28, no. 11, pp. 3251-3261, Nov. 2018] proposed a raster scan method with external memory, modified affine transform (Affine transform), and reduced the time of register access . In the end, the Affine Transform module improved throughput by 325%. The video is 640×480 with 20fps processing speed. However Yum [see J. Yum, C. Lee, J. Kim and H. Lee, "A Novel Hardware Architecture with Reduced Internal Memory for Real-Time Extraction of SIFT in an HD Video," in IEEE Transactions on Circuits and Systems for Video Technology , vol. 26, no. 10, pp. 1943-1954, Oct. 2016] and Yum [see J. Yum, C. Lee, J. Park, J. Kim and H. Lee, “A Hardware Architecture for the Affine-Invariant Extension of SIFT,” in IEEE Transactions on Circuits and Systems for Video Technology , vol. 28, no. 11, pp. 3251-3261, Nov. 2018] is too slow. Acharya [See KA Acharya, RV Babu, and SS Vadhiyar, “A real-time implementation of SIFT using GPU,” Journal of Real-Time Image Processing. vol. 14, no. 2, pp. 267-277, 2018] Using the GPU to implement SIFT, there is a processing speed of 55fps for 640×480 images. And it has been successfully applied to target detection and target tracking. Due to the large amount of computation of SIFT, the speed is improved by optimization and GPU, and the speed is 12.2% higher than that without GPU. If this system is implemented in hardware, the speed increase is not only 12.2%. Li [See S. Li, W. Wang, W. Pan, CJ Hsu and C. Lu, “FPGA-Based Hardware Design for Scale-Invariant Feature Transform,” in IEEE Access , vol. 6, pp. 43850-43864 , 2018] proposed an FPGA-based SIFT architecture. In an independent manner, different Gaussian images are created by convolving the original image with the Gaussian kernel simulated by the software. And a set of feature detection is proposed using the adjoint matrix to replace the divider. to reduce hardware usage. When generating the feature descriptor, the CORDIC algorithm is used to calculate the direction and gradient of the pixel. And can handle 150fps 640 × 480 video. However, when the hardware cost is too high, and the hardware has no feature-matching architecture, it still needs to go through the software when matching. In addition, there is no data valid signal in this architecture, which will cause discontinuous data transmission when communicating with the external memory, which will cause system operation errors and reduce the stability of the system running on the FPGA.

然而以FPGA全硬體架構來實現SIFT演算法時，仍需運算指數函數、浮點數及大幅使用除法器邏輯閘，使得影像辨識耗費大量運算時間，而無法達到即時辨識之目的。However, when the SIFT algorithm is implemented with the FPGA hardware architecture, it still needs to calculate exponential functions, floating point numbers and use divider logic gates, which makes image recognition consume a lot of computing time and cannot achieve the purpose of real-time recognition.

本發明的目的在於提供一種立體影像辨識及匹配系統，其中該影像金字塔建構模組，與該影像輸入模組耦接，係預先以軟體找出複數個不同尺度之高斯模板遮罩參數，再透過複數個高斯濾波器模組平行進行複數個卷積運算，其中各所述卷積運算係依該影像資料與一所述遮罩參數進行，以獲得複數個高斯影像，用以克服習知技術在高斯模板運算時使用指數函數所產生的硬體浮點數及耗費大量運算成本之問題，以達到有效的提升系統效能之目的。The object of the present invention is to provide a stereoscopic image recognition and matching system, wherein the image pyramid construction module is coupled to the image input module, and a plurality of Gaussian template mask parameters of different scales are found in advance by software, and then passed through A plurality of Gaussian filter modules perform a plurality of convolution operations in parallel, wherein each of the convolution operations is performed according to the image data and a mask parameter to obtain a plurality of Gaussian images, which is used to overcome the conventional technology in The problem of using the hardware floating point numbers generated by the exponential function and consuming a large amount of computing costs in Gaussian template operation can effectively improve the system performance.

為達上述目的，本發明提供一種立體影像辨識及匹配系統，其包括: 一第一SIFT模組，其輸入端為一左眼視覺影像，用以進行一左眼的特徵偵測與描述後輸出一左眼影像特徵點；一第二SIFT模組，其輸入端為一右眼視覺影像，用以進行一右眼的特徵偵測與描述後輸出一右眼影像特徵點；一座標計算模組，耦接至該左眼視覺影像特徵點及右眼視覺影像特徵點，用以計算及輸出該左眼影像特徵點及右眼影像特徵點的影像座標；以及一立體功能匹配模組，分別耦接至該第一SIFT模組、座標計算模組及第二SIFT模組，根據該左眼影像特徵點、右眼影像特徵點及其影像座標進行匹配後輸出。In order to achieve the above object, the present invention provides a stereoscopic image recognition and matching system, which includes: a first SIFT module, whose input end is a left eye visual image, for performing feature detection and description of a left eye and outputting a left-eye image feature point; a second SIFT module whose input terminal is a right-eye visual image, which is used to detect and describe a right-eye feature and output a right-eye image feature point; a coordinate calculation module , coupled to the left-eye visual image feature point and the right-eye visual image feature point, for calculating and outputting the image coordinates of the left-eye image feature point and the right-eye image feature point; and a stereo function matching module, respectively coupled to Connected to the first SIFT module, the coordinate calculation module and the second SIFT module, and output after matching according to the left-eye image feature point, the right-eye image feature point and their image coordinates.

本發明的另一目的在於提供一種立體影像辨識及匹配系統，該第一SIFT模組進一步包括:一第一影像金字塔建構模組，與該左眼視覺影像耦接，係預先以軟體找出複數個不同尺度之高斯模板遮罩參數，再透過複數個高斯濾波器模組平行進行複數個卷積運算，其中各所述卷積運算係依該影像資料與一所述遮罩參數進行，以獲得複數個高斯影像，接著將兩兩的不同尺度之高斯模板遮罩參數將減後，再透過複數個相減後的高斯濾波器模組平行進行複數個卷積運算，其中各所述卷積運算係依該影像資料與一所述遮罩參數進行，以獲得複數個高斯差分影像；一第一功能偵測模組，與該第一影像金字塔建構模組耦接，係對該第一影像金字塔建構模組輸出之影像資料進行極值偵測；一第一功能描述子模組，與該第一影像金字塔建構模組耦接，用以將每個像素的描述子給計算出來，透過與周圍點來找出該點的方向與梯度，並且利用範圍統計的方式，直方圖統計範圍內的方向梯度，建立出64維的描述子；以及一第一選擇器，其輸入端分別耦接至該第一功能偵測模組及該第一功能描述子模組，用以擇一輸出。Another object of the present invention is to provide a stereoscopic image recognition and matching system, the first SIFT module further includes: a first image pyramid construction module, coupled with the left-eye visual image, and finds plural numbers by software in advance Gaussian mask parameters of different scales, and then a plurality of convolution operations are performed in parallel through a plurality of Gaussian filter modules, wherein each of the convolution operations is performed according to the image data and a mask parameter to obtain After a plurality of Gaussian images, the mask parameters of the Gaussian templates of different scales are then subtracted, and then a plurality of convolution operations are performed in parallel through a plurality of subtracted Gaussian filter modules, wherein each of the convolution operations The process is performed according to the image data and a mask parameter to obtain a plurality of Gaussian difference images; a first function detection module, coupled to the first image pyramid construction module, is the first image pyramid The image data output by the construction module is used for extreme value detection; a first functional description sub-module is coupled with the first image pyramid construction module, and is used to calculate the descriptor of each pixel, and through and surrounding point to find the direction and gradient of the point, and use range statistics, histogram The directional gradient within the statistical range is used to establish a 64-dimensional descriptor; and a first selector, the input of which is respectively coupled to the first function detection module and the first function description submodule for selecting an output.

其中該第二SIFT模組進一步包括:一第二影像金字塔建構模組，與該右眼視覺影像耦接，係預先以軟體找出複數個不同尺度之高斯模板遮罩參數，再透過複數個高斯濾波器模組平行進行複數個卷積運算，其中各所述卷積運算係依該影像資料與一所述遮罩參數進行，以獲得複數個高斯影像，接著將兩兩的不同尺度之高斯模板遮罩參數將減後，再透過複數個相減後的高斯濾波器模組平行進行複數個卷積運算，其中各所述卷積運算係依該影像資料與一所述遮罩參數進行，以獲得複數個高斯差分影像；一第二功能偵測模組，與該第二影像金字塔建構模組耦接，係對該第二影像金字塔建構模組輸出之影像資料進行極值偵測；一第二功能描述子模組，與該第二影像金字塔建構模組耦接，用以將每個像素的描述子給計算出來，透過與周圍點來找出該點的方向與梯度，並且利用範圍統計的方式，統計範圍內的方向梯度，建立出64維的描述子；以及一第二選擇器，其輸入端分別耦接至該第二功能偵測模組及該第二功能描述子模組，用以擇一輸出。Wherein the second SIFT module further includes: a second image pyramid construction module, coupled with the right eye visual image, finds a plurality of Gaussian template mask parameters of different scales by software in advance, and then passes through a plurality of Gaussian mask parameters The filter module performs a plurality of convolution operations in parallel, wherein each of the convolution operations is performed according to the image data and a mask parameter to obtain a plurality of Gaussian images, and then two Gaussian templates of different scales are combined. After the mask parameters are subtracted, a plurality of convolution operations are performed in parallel through a plurality of subtracted Gaussian filter modules, wherein each of the convolution operations is performed according to the image data and a mask parameter to obtain obtaining a plurality of Gaussian difference images; a second function detection module, coupled to the second image pyramid construction module, for performing extreme value detection on the image data output by the second image pyramid construction module; a first Two functional descriptor sub-modules, coupled with the second image pyramid construction module, are used to calculate the descriptor of each pixel, find out the direction and gradient of the point through and surrounding points, and use range statistics way of counting the directional gradients within the range to establish a 64-dimensional descriptor; and a second selector whose input ends are respectively coupled to the second function detection module and the second function description submodule, Used to select an output.

其中該第一影像金字塔建構模組進一步包括：一第一高斯影像金字塔，建立該第一高斯影像金字塔時，需要先建立連續尺度的空間影像，透過初始影像與高斯模板進行卷積運算，即可得到高斯模糊影像：以及一第一高斯差分金字塔，耦接至該第一高斯影像金字塔，接著將兩兩的不同尺度之高斯模板遮罩參數將減後，再與原始影像進行卷積運算，以獲得複數個高斯差分影像。The first image pyramid construction module further includes: a first Gaussian image pyramid. When building the first Gaussian image pyramid, it is necessary to create a continuous-scale spatial image first, and then perform a convolution operation between the initial image and the Gaussian template, and then you can Obtaining a Gaussian blurred image: and a first Gaussian difference pyramid, coupled to the first Gaussian image pyramid, and then subtracting the Gaussian template mask parameters of each pair of different scales, and then performing a convolution operation with the original image to obtain Obtain multiple Gaussian difference images.

其中該第一功能偵測模組進一步包括：一第一極值偵測模組，耦接至該第一高斯差分金字塔，透過兩兩高斯模板相減後與原始影像進行卷積運算的高斯差分影像，即可得到高通影像，再利用高通影像偵測極值特徵；一第一高對比偵測模組，耦接至該第一高斯差分金字塔，其進一步包括一第一一階偏微分矩陣模組、一第一海森矩陣模組、一第一伴隨矩陣模組、一第一行列式計算模組、一第一高對比度特徵偵測模組及一第一角點偵測模組，再將該第一一階偏微分矩陣模組、第一伴隨矩陣模組、第一行列式計算模組及該第一高對比度特徵偵測模組之輸出訊號進行一及運算計算出高對比度特徵；一第一角點偵測模組，耦接至該第一高斯差分金字塔，該第一海森矩陣模組的輸出會進入該第一角點偵測模組，計算出角特徵；以及一第一及閘，其輸入端分別耦接至該第一極值偵測模組、第一高對比偵測模組及第一角點偵測模組，經由該第一及閘運算後即可得特徵點。Wherein the first function detection module further includes: a first extreme value detection module, coupled to the first Gaussian difference pyramid, through the Gauss difference of the original image after the subtraction of the two Gaussian templates and the convolution operation with the original image image, a high-pass image can be obtained, and then the high-pass image is used to detect extreme value features; a first high-contrast detection module is coupled to the first Gaussian difference pyramid, which further includes a first-order partial differential matrix module set, a first Hessian matrix module, a first companion matrix module, a first determinant calculation module, a first high-contrast feature detection module, and a first corner detection module, and then performing a sum operation on the output signals of the first-order partial differential matrix module, the first adjoint matrix module, the first determinant calculation module and the first high-contrast feature detection module to calculate the high-contrast feature; a first corner detection module coupled to the first Gaussian difference pyramid, the output of the first Hessian matrix module will enter the first corner detection module to calculate corner features; and a first An and gate, the input terminals of which are respectively coupled to the first extreme value detection module, the first high-contrast detection module and the first corner detection module, after the operation of the first sum gate can be obtained Feature points.

其中該第一功能描述子模組進一步包括：一第一梯度計算模組，耦接至該第一高斯影像金字塔，用以計算完像素點方向；一第一方向計算模組，耦接至該第一高斯影像金字塔，用以計算完像素點梯度；一第一方向梯度範圍統計模組，耦接至該第一梯度計算模組及第一方向計算模組，用以統計像素點方向及梯度；以及一第一正規化模組，耦接至該第一方向梯度範圍統計模組，用以將該描述子進行標準化。The first function description sub-module further includes: a first gradient calculation module, coupled to the first Gaussian image pyramid, for calculating the direction of pixels; a first direction calculation module, coupled to the first Gaussian image pyramid The first Gaussian image pyramid is used to calculate the gradient of the pixel point; a first direction gradient range statistics module is coupled to the first gradient calculation module and the first direction calculation module, and is used to calculate the pixel point direction and gradient ; and a first normalization module, coupled to the first directional gradient range statistics module, for normalizing the descriptor.

其中該第二影像金字塔建構模組進一步包括：一第二高斯模糊金字塔，建立該第二高斯模糊金字塔時，需要先建立連續尺度的空間影像，透過初始影像與高斯模板進行卷積運算，即可得到高斯模糊影像：以及一第二高斯差分金字塔，耦接至該第二高斯模糊金字塔，接著將兩兩的不同尺度之高斯模板遮罩參數將減後，再與原始影像進行卷積運算，以獲得複數個高斯差分影像。The second image pyramid construction module further includes: a second Gaussian blur pyramid. When establishing the second Gaussian blur pyramid, a continuous scale spatial image needs to be established first, and a convolution operation is performed on the initial image and the Gaussian template. Obtaining a Gaussian blurred image: and a second Gaussian difference pyramid, coupled to the second Gaussian blurred pyramid, and then subtracting the Gaussian template mask parameters of each pair of different scales, and then performing a convolution operation with the original image to obtain Obtain multiple Gaussian difference images.

其中該第二功能偵測模組進一步包括：一第二極值偵測模組，耦接至該第二高斯差分金字塔，透過兩兩高斯模板相減後與原始影像進行卷積運算的高斯差分影像，即可得到高通影像，再利用高通影像偵測極值特徵；一第二高對比偵測模組，耦接至該第一高斯差分金字塔，其進一步包括一第二一階偏微分矩陣模組、一第二海森矩陣模組、一第二伴隨矩陣模組、一第二行列式計算模組、一第二高對比度偵測模組及一第二角點偵測模組，再將該第二一階偏微分矩陣模組、第二伴隨矩陣模組、第二行列式計算模組及該第二高對比度偵測模組之輸出訊號進行一及運算計算出高對比度特徵；一第二角點偵測模組，耦接至該第二高斯差分金字塔，該第二海森矩陣模組的輸出會進入該第二角點偵測模組，計算出角特徵；以及一第二及閘，其輸入端分別耦接至該第二極值偵測模組、第二高對比偵測模組及第二角點偵測模組，經由該第二及閘運算後即可得特徵點。Wherein the second function detection module further includes: a second extreme value detection module, coupled to the second Gaussian difference pyramid, and the Gaussian difference of the original image is subjected to a convolution operation after subtracting two Gaussian templates. image, a high-pass image can be obtained, and then the high-pass image is used to detect extreme value features; a second high-contrast detection module is coupled to the first Gaussian difference pyramid, which further includes a second-first-order partial differential matrix module group, a second Hessian matrix module, a second companion matrix module, a second determinant calculation module, a second high contrast detection module and a second corner detection module, and then The output signals of the second first-order partial differential matrix module, the second adjoint matrix module, the second determinant calculation module and the second high-contrast detection module are summed to calculate high-contrast features; a first A two-corner point detection module is coupled to the second Gaussian difference pyramid, and the output of the second Hessian matrix module will enter the second corner point detection module to calculate corner features; and a second and a gate, whose input terminals are respectively coupled to the second extreme value detection module, the second high contrast detection module and the second corner detection module, and the feature points can be obtained after the second and gate operations .

其中該第二功能描述子模組進一步包括：一第二梯度計算模組，耦接至該第二高斯模糊金字塔，用以計算完像素點方向；一第二方向計算模組，耦接至該第二高斯模糊金字塔，用以計算完像素點梯度；一第二方向梯度範圍統計模組，耦接至該第二梯度計算模組及第二方向計算模組，用以統計像素點方向及梯度；以及一第二正規化模組，耦接至該第二方向梯度範圍統計模組，用以將該描述子進行標準化。Wherein the second function description sub-module further includes: a second gradient calculation module, coupled to the second Gaussian blur pyramid, used to calculate the direction of the pixel point; a second direction calculation module, coupled to the The second Gaussian fuzzy pyramid is used to calculate the gradient of the pixel point; a second direction gradient range statistics module is coupled to the second gradient calculation module and the second direction calculation module, and is used to calculate the pixel point direction and gradient ; and a second normalization module, coupled to the second directional gradient range statistics module, for normalizing the descriptor.

其中該立體功能匹配模組進一步包括：一串列轉並列記憶體，分別耦接至該第一SIFT模組及第二SIFT模組，分別可將複數個左、右影像特徵點與特徵點座標依序存入複數個暫存器中，並同時將暫存器的複數個特徵點資訊輸出，達到串列輸入轉並列輸出的效果；一最小維度計算模組，耦接至該串列轉並列記憶體，用以當右影像特徵點訊號來臨時，會判斷該串列轉並列記憶體的複數個左影像的y 座標與右影像R的座標比對，是否相等；一匹配模組，耦接至該最小維度計算模組，用以找尋該匹配模組輸出的最小值，最終再判斷輸出後的最小值是否過大，若大於某個閥值，予以剔除；以及一深度計算模組，耦接至該匹配模組，用以計算該特徵點在實際中與立體視覺攝影機的距離，也就是深度計算，主要是透過匹配點在影像上的距離與實際的距離進行相似三角形的方式判斷。The three-dimensional function matching module further includes: a serial-to-parallel memory, respectively coupled to the first SIFT module and the second SIFT module, which can respectively match a plurality of left and right image feature points and feature point coordinates Store in a plurality of registers in sequence, and output the information of a plurality of feature points in the registers at the same time, so as to achieve the effect of serial input to parallel output; a minimum dimension calculation module is coupled to the serial to parallel output The memory is used for judging whether the y -coordinates of the plurality of left images in the serial-to-parallel memory are compared with the coordinates of the right image R when the right image feature point signal arrives, and whether they are equal; a matching module is coupled to to the minimum dimension calculation module to find the minimum value output by the matching module, and finally to determine whether the outputted minimum value is too large, and if it is greater than a certain threshold, it will be rejected; and a depth calculation module, coupled to To the matching module, it is used to calculate the actual distance between the feature point and the stereo vision camera, that is, the depth calculation, which is mainly judged by the similar triangle method between the distance of the matching point on the image and the actual distance.

為使貴審查委員能其進一步瞭解本發明之結構、特徵及其目的，茲附以圖示及較佳具體實施例之詳細說明如後。In order to enable your examiners to further understand the structure, features and purposes of the present invention, drawings and detailed descriptions of preferred embodiments are attached as follows.

請參照圖1，其繪示本發明一較佳實施例之立體影像辨識及匹配系統之組合示意圖。Please refer to FIG. 1 , which is a schematic diagram illustrating a combination of a stereoscopic image recognition and matching system according to a preferred embodiment of the present invention.

如圖1所示，本發明之立體影像辨識及匹配系統，其包括：一第一SIFT模組100；一第二SIFT模組200；一座標計算模組300；以及一立體功能匹配模組400。As shown in FIG. 1 , the stereoscopic image recognition and matching system of the present invention includes: a first SIFT module 100 ; a second SIFT module 200 ; a coordinate calculation module 300 ; and a stereo function matching module 400 .

其中，該第一SIFT模組100其輸入端為一左眼視覺影像，用以進行一左眼的特徵偵測與描述後輸出一左眼影像特徵點。The input end of the first SIFT module 100 is a left-eye visual image, which is used for detecting and describing a left-eye feature and then outputting a left-eye image feature point.

該第二SIFT模組200其輸入端為一右眼視覺影像，用以進行一右眼的特徵偵測與描述後輸出一右眼影像特徵點。The input end of the second SIFT module 200 is a right-eye visual image, which is used for detecting and describing a right-eye feature and then outputting a right-eye image feature point.

該座標計算模組300係分別耦接至該左眼視覺影像特徵點及右眼視覺影像特徵點，用以計算及輸出該左眼影像特徵點及右眼影像特徵點的影像座標。The coordinate calculation module 300 is respectively coupled to the left-eye visual image feature point and the right-eye visual image feature point, and is used for calculating and outputting the image coordinates of the left-eye image feature point and the right-eye image feature point.

該立體功能匹配模組400分別耦接至該第一SIFT模組100、座標計算模組300及第二SIFT模組200，根據該左眼影像特徵點、右眼影像特徵點及其影像座標進行匹配後輸出。The stereo function matching module 400 is respectively coupled to the first SIFT module 100 , the coordinate calculation module 300 and the second SIFT module 200 , and performs processing according to the left-eye image feature points, the right-eye image feature points and their image coordinates. output after matching.

請一併參照圖2及圖3，其中，圖2繪示本發明一較佳實施例之第一SIFT模組之細部方塊示意圖示意圖；圖3繪示本發明一較佳實施例之第二SIFT模組之細部方塊示意圖示意圖。Please refer to FIG. 2 and FIG. 3 together, wherein, FIG. 2 shows a schematic block diagram of a detailed block diagram of a first SIFT module according to a preferred embodiment of the present invention; FIG. 3 shows a second SIFT module according to a preferred embodiment of the present invention. Schematic diagram of the detailed block diagram of the module.

如圖2所示，該第一SIFT模組100進一步包括:一第一影像金字塔建構模組110；一第一功能偵測模組120；一第一功能描述子模組130；以及一第一選擇器140。As shown in FIG. 2, the first SIFT module 100 further includes: a first image pyramid construction module 110; a first function detection module 120; a first function description submodule 130; and a first selector 140.

其中，該第一影像金字塔建構模組110與該左眼視覺影像耦接，其進一步包含一第一高斯影像金字塔111及一第一差分影像金字塔112，該第一高斯影像金字塔111係預先以軟體找出複數個不同尺度之高斯模板遮罩參數，再透過複數個高斯濾波器模組（圖未示）平行進行複數個卷積運算，其中各所述卷積運算係依該影像資料與一所述遮罩參數進行，以獲得複數個高斯影像，之後，再將所述複數個高斯影像兩兩輸入至該第一差分影像金字塔112，進行高斯影像相減。The first image pyramid construction module 110 is coupled to the left-eye visual image, and further includes a first Gaussian image pyramid 111 and a first differential image pyramid 112, the first Gaussian image pyramid 111 is pre-processed by software Find out a plurality of Gaussian mask parameters of different scales, and then perform a plurality of convolution operations in parallel through a plurality of Gaussian filter modules (not shown in the figure), wherein each of the convolution operations is based on the image data and a The mask parameters are performed to obtain a plurality of Gaussian images, and then the plurality of Gaussian images are input into the first differential image pyramid 112 two by two, and the Gaussian images are subtracted.

在影像金字塔的卷積運算中，本發明採用8-bit的7×7遮罩。為了將暫存器的數量減到最少，本發明使用49個8-bit的暫存器與6列RAM基礎的移位暫存器（RAM-Based shift register），它為Altera內建優化的元件，具有較少的硬體及較有效率的操作速度。由於本發明採用寬度例如但不限於為640像素（pixel）的影像輸入，因此每個RAM-Based shift register陣列中會有633個暫存器。In the convolution operation of the image pyramid, the present invention uses an 8-bit 7×7 mask. In order to minimize the number of registers, the present invention uses 49 8-bit registers and 6 columns of RAM-Based shift registers, which are optimized components built into Altera , with less hardware and more efficient operation speed. Since the present invention uses an image input with a width of, for example, but not limited to, 640 pixels, there are 633 registers in each RAM-Based shift register array.

該第一功能偵測模組120耦接至該第一影像金字塔建構模組110，係對該第一影像金字塔建構模組110輸出之影像資料進行極值偵測。The first function detection module 120 is coupled to the first image pyramid construction module 110 and performs extreme value detection on the image data output by the first image pyramid construction module 110 .

該第一功能描述子模組130耦接至該第一影像金字塔建構模組110，用以將每個像素的描述子（Descriptor）給計算出來，透過與周圍點來找出該點的方向與梯度，並且利用範圍統計模組統計範圍內的方向梯度，建立出64維的描述子。The first function description sub-module 130 is coupled to the first image pyramid construction module 110, and is used to calculate the descriptor (Descriptor) of each pixel, and find out the direction and the direction of the point by comparing with surrounding points. Gradient, and use the directional gradient within the statistical range of the range statistics module to establish a 64-dimensional descriptor.

該第一選擇器140其輸入端分別耦接至該第一功能偵測模組120及該第一功能描述子模組130，用以擇一後輸出。Input ends of the first selector 140 are respectively coupled to the first function detection module 120 and the first function description sub-module 130 for outputting after selecting one.

其中，該第一影像金字塔建構模組110主要是為了找出特徵偵測時所需要的連續尺度空間差異，它包含該第一高斯影像金字塔111及該第一差分影像金字塔112。該第一高斯影像金字塔111是透過高斯濾波器模組，將初始影像模糊化，並消除影像雜訊，確保所獲得的高斯影像在後續的運算不受雜訊的干擾。該第一差分影像金字塔112是將該兩兩高斯模板相減，再與原始影像進行卷積運算，產生出高斯差分影像，勾勒出影像輪廓，以保留影像特徵，作為後續特徵點偵測的基礎。在過去，已有加快產生高斯金字塔及差分影像金字塔的方式被提出來，例如中華民國發明第I592897號專利中所示。該做法雖然可以計算出差分影像，但卻需要花較多的硬體及時間來進行減法運算。The first image pyramid construction module 110 is mainly used to find the continuous scale space difference required for feature detection, and includes the first Gaussian image pyramid 111 and the first differential image pyramid 112 . The first Gaussian image pyramid 111 blurs the initial image and eliminates image noise through a Gaussian filter module, so as to ensure that the obtained Gaussian image is not disturbed by noise in subsequent operations. The first differential image pyramid 112 subtracts the two Gaussian templates, and then performs a convolution operation with the original image to generate a Gaussian differential image, outlines the image contour, and retains the image features as the basis for subsequent feature point detection. . In the past, methods to speed up the generation of Gaussian pyramids and differential image pyramids have been proposed, such as shown in the ROC Invention Patent No. I592897. Although this method can calculate the difference image, it requires more hardware and time to perform the subtraction operation.

請參照圖4，其繪示本發明一較佳實施例之第一SIFT模組以平行處理的架構來同時完成高斯及差分影像之示意圖。Please refer to FIG. 4 , which is a schematic diagram illustrating that the first SIFT module in a preferred embodiment of the present invention simultaneously completes Gaussian and differential images in a parallel processing structure.

如圖所示，在本發明中，為了再加快影像金字塔的建構速度，使用平行(Parallel)處理的架構在該第一高斯影像金字塔111及複數個該第一差分影像金字塔112對經一遮罩115，例如但不限於為7×7遮罩，遮罩後的初始影像來同時完成高斯影像及差分影像。其中差分影像，D_n (x ,y )，是先將相鄰的兩個高斯核相減後，再進行卷積(Convolution)來產生。其卷積運算所採用的方程式如下：

（1） As shown in the figure, in the present invention, in order to further speed up the construction speed of the image pyramid, a parallel processing structure is used to pass a mask between the first Gaussian image pyramid 111 and the plurality of first differential image pyramids 112 . 115, such as but not limited to a 7×7 mask, the initial image after the mask to complete the Gaussian image and the difference image at the same time. The difference image, D _n ( x , y ), is generated by first subtracting two adjacent Gaussian kernels, and then performing convolution. The equation used for its convolution operation is as follows:

(1)

其中L_n (x ,y )和G_n (x ,y )分別為第n 層的高斯影像和高斯核，而I (x ,y )為初始影像。這不僅可減少大量的硬體，並可以加快操作的速度。在這裡，本發明僅會保留一張高斯影像作為後續的處理。在此架構下，當第一筆資料來臨時，僅需要一個時脈的時間即可將該7x7遮罩115輸出，並將輸出結果與高斯核進行卷積運算。再接下來的其他尺寸不同的遮罩，一樣也可以用此架構來實現。where L _n ( x , y ) and G _n ( x , y ) are the Gaussian image and Gaussian kernel of the nth layer, respectively, and I ( x , y ) is the initial image. Not only does this save a lot of hardware, it also speeds up the operation. Here, the present invention only retains a Gaussian image for subsequent processing. Under this architecture, when the first data comes, it only takes one clock to output the 7x7 mask 115, and the output result is convolved with a Gaussian kernel. The following other masks of different sizes can also be implemented with this architecture.

其中，在該第一功能偵測模組120中，在該第一影像金字塔110建立後，特徵偵測會將該第一影像金字塔110輸出的三張連續的差分影像進行矩陣的運算，再經由三種偵測（含極值、高對比及角點）來完成。Wherein, in the first function detection module 120, after the first image pyramid 110 is established, the feature detection will perform a matrix operation on three consecutive differential images output by the first image pyramid 110, and then use the Three kinds of detection (including extreme value, high contrast and corner points) are completed.

請參照圖5，其繪示本發明一較佳實施例之第一SIFT模組之第一功能偵測模組之特徵偵測的硬體架構之示意圖。Please refer to FIG. 5 , which is a schematic diagram of the hardware structure of the feature detection of the first function detection module of the first SIFT module according to a preferred embodiment of the present invention.

如圖所示，該第一功能偵測模組120之特徵偵測的硬體架構至少包含：一第一極值偵測模組121、一第一高對比度偵測模組122；一第一角點偵測模組123；一第一一階偏微分矩陣模組124；一第一海森矩陣模組125；一第一伴隨矩陣模組126以及一第一行列式計算模組127。當該第一極值偵測模組121、第一高對比度偵測模組122以及第一角點偵測模組123滿足三種偵測的結果，即表示為特徵點。其中，該第一極值偵測模組121可偵測極大值及極小值。As shown in the figure, the hardware structure of the feature detection of the first function detection module 120 at least includes: a first extreme value detection module 121, a first high contrast detection module 122; a first A corner detection module 123 ; a first-order partial differential matrix module 124 ; a first Hessian matrix module 125 ; a first adjoint matrix module 126 and a first determinant calculation module 127 . When the first extreme value detection module 121 , the first high contrast detection module 122 and the first corner detection module 123 satisfy the three detection results, it is indicated as a feature point. Wherein, the first extreme value detection module 121 can detect the maximum value and the minimum value.

在該第一極值偵測模組121中，本發明之複數個遮罩128，例如但不限於為3×3 遮罩產生出的27筆資料(3D DoG image)進行極大值或極小值的偵測。偵測的方法是利用第二層的差分影像中的中間值與周圍26筆資料比較。若比較的結果是極大值或極小值，則本發明將此點定義為特徵點。然而，由於經過極值偵測後的特徵點還是會有不易辨識的問題或是雜訊干擾的問題。因此，本發明藉由高對比度偵測模組122及角點偵測模組123來進行高對比度特徵與角特徵來限制極值特徵的結果。如圖5所示，將該第一極值偵測模組121、第一高對比度偵測模組122以及第一角點偵測模組123偵測的輸出經過一第一及閘129保留本發明需要的特徵點，以改善特徵點不易辨識且易受雜訊干擾的情形。其中，該第一角點偵測模組123在進行平面上的角偵測與高對比度偵測時，需要使用二階偏微分進行運算，且在硬體實現上，本發明使用第一海森矩陣模組125來表示二階偏微分，其計算角點偵測的方式請參照中華民國發明第I592897號專利說明書，在此不擬重複贅述。本發明之計算高對比度的方式與中華民國發明第I592897號專利所提出的計算方式不同之處在於：該第一高對比度模組122在進行高對比度偵測時，是利用該第一一階偏微分模組124、第一海森矩陣模組125、第一伴隨矩陣模組126以及第一行列式計算模組127完成，其計算方法如下：其中，該第一伴隨矩陣模組126的輸入是二階偏微分矩陣(海森矩陣)的每個元素，並經由計算(計算方法可以參考維基百科: https://zh.wikipedia.org/wiki/%E4%BC%B4%E9%9A%8F%E7%9F%A9%E9%98%B5)，而實際上我們的伴隨矩陣會是這樣，如方程式(2)所示。其中我們可以發現Adj ₁₂ =Adj ₂₁ ,Adj ₁₃ =Adj ₃₁ ,Adj ₂₃ =Adj ₃₂ ，這部分是因為我們的輸入端(海森矩陣)的元素在H ₁₂ =H₂₁ ,H 13=H ₃₁ ,H ₂₃ =H ₃₂ 。

(2) In the first extreme value detection module 121, a plurality of masks 128 of the present invention, such as, but not limited to, 27 pieces of data (3D DoG images) generated by a 3×3 mask are subjected to maximum or minimum values. detect. The detection method is to compare the median value in the second layer differential image with the surrounding 26 data. If the result of the comparison is a maximum value or a minimum value, the present invention defines this point as a feature point. However, the feature points after the extreme value detection still have the problem of difficult identification or the problem of noise interference. Therefore, the present invention uses the high-contrast detection module 122 and the corner detection module 123 to perform the high-contrast feature and the corner feature to limit the result of the extreme value feature. As shown in FIG. 5 , the outputs detected by the first extreme value detection module 121 , the first high-contrast detection module 122 and the first corner detection module 123 pass through a first gate 129 to retain the original value. The required feature points are invented to improve the situation that the feature points are difficult to identify and susceptible to noise interference. Wherein, the first corner detection module 123 needs to use the second-order partial differential for calculation when performing corner detection and high-contrast detection on the plane, and in terms of hardware implementation, the present invention uses the first Hessian matrix The module 125 represents the second-order partial differential, and the method for calculating the corner detection can be referred to in the patent specification of the Republic of China Invention No. I592897, which will not be repeated here. The difference between the calculation method of high contrast of the present invention and the calculation method proposed in the Republic of China Patent No. I592897 is that the first high contrast module 122 uses the first-order offset when performing high contrast detection. The differential module 124, the first Hessian matrix module 125, the first companion matrix module 126 and the first determinant calculation module 127 are completed, and the calculation method is as follows: wherein, the input of the first companion matrix module 126 is Each element of the second-order partial differential matrix (Hessian matrix) is calculated (for the calculation method, please refer to Wikipedia: https://zh.wikipedia.org/wiki/%E4%BC%B4%E9%9A%8F% E7%9F%A9%E9%98%B5), when in fact our adjoint matrix would be like this, as shown in equation (2). where we can find Adj ₁₂ = Adj ₂₁ , Adj ₁₃ = Adj ₃₁ , Adj ₂₃ = Adj ₃₂ , partly because the elements of our input (Hessian matrix) are in H ₁₂ =H ₂₁ , H 13 = H ₃₁ , H ₂₃ = H ₃₂ .

(2)

該第一行列式計算模組127計算二階偏微分矩陣(海森矩陣)的行列式值，計算方法如方程式(3)所示。目的是為了完成逆矩陣的計算。

(3) The first determinant calculation module 127 calculates the determinant value of the second-order partial differential matrix (Hessian matrix), and the calculation method is shown in equation (3). The purpose is to complete the calculation of the inverse matrix.

(3)

在高對比度偵測時，需要計算出偵測點與周圍像素變化量來判斷。在本發明中，我們設定當差分影像中的像素變化量大於1/32時，來表示該偵測點為具有高對比度特徵，並予以保留，而不是0.03來減少硬體的成本，

(4) 其中，X =(x ,y ,s )代表三維座標。由於方程式(4)可經由泰勒展開式寫成馬克勞林級數

(5)

其中D (0)代表複數平面上偵測點位置的值。而三維的一階偏微分為

(6)

其中

(7)

(8)

(9)

而D (0)的二次微分可以表示成三維的海森矩陣

(10)

接著令方程式(5)一次微分的結果等於0，求出像素變化量最大時的解

(11)

由於逆矩陣的實現需要大量的除法器，會大量增加硬體的面積與成本。因此，本發明利用行列式與伴隨矩陣來完成逆矩陣的運算。由於伴隨矩陣中的某些元素運算結果相同，因此在硬體實現上僅需計算6個元素即可。接著，再利用伴隨矩陣第一列的元素來求出行列式值，以降低硬體成本。接著再將方程式(11)代回方程式(5)後化簡，可得

(12)

最後，結合方程式(6)可得

(13)

In high-contrast detection, it is necessary to calculate the variation of the detection point and surrounding pixels to judge. In the present invention, we set when the pixel variation in the differential image is greater than 1/32, to indicate that the detection point has a high contrast feature, and keep it instead of 0.03 to reduce the cost of hardware,

(4)

Among them, X = ( x , y , s ) represents three-dimensional coordinates. Since equation (4) can be written as Maclaurin series via Taylor expansion

(5)

where D (0) represents the value of the detection point position on the complex plane. The three-dimensional first-order partial differential is

(6)

in

(7)

(8)

(9)

And the quadratic differential of D (0) can be expressed as a three-dimensional Hessian matrix

(10)

Then set the result of the first derivative of equation (5) equal to 0, and find the solution when the pixel change amount is the largest

(11)

Since the implementation of the inverse matrix requires a large number of dividers, it will greatly increase the area and cost of the hardware. Therefore, the present invention uses the determinant and the adjoint matrix to complete the operation of the inverse matrix. Since some elements in the adjoint matrix have the same operation result, only 6 elements need to be calculated on the hardware implementation. Then, the elements of the first column of the adjoint matrix are used to obtain the determinant value to reduce the hardware cost. Then, substituting equation (11) back into equation (5) and simplifying, we can get

(12)

Finally, combining Equation (6), we can get

(13)

請再參照圖2，該第一功能描述子模組130是利用第一影像金字塔110輸出的高斯影像，將每個點與周遭點利用範圍統計的方式進行方向及梯度統計。其中，該第一功能描述子模組130進一步包括：一第一梯度計算模組131，耦接至該第一高斯影像金字塔110，用以計算完像素點方向；一第一方向計算模組132，耦接至該第一高斯影像金字塔111，用以計算完像素點梯度；一第一範圍統計模組133，耦接至該第一梯度計算模組131及第一方向計算模組132，用以統計像素點方向及梯度；以及一第一正規化模組134，耦接至該第一範圍統計模組133，用以將該描述子進行標準化。Referring to FIG. 2 again, the first function description sub-module 130 uses the Gaussian image output by the first image pyramid 110 to perform direction and gradient statistics on each point and surrounding points by means of range statistics. Wherein, the first function description sub-module 130 further includes: a first gradient calculation module 131, coupled to the first Gaussian image pyramid 110, for calculating the pixel point direction; a first direction calculation module 132 , which is coupled to the first Gaussian image pyramid 111 for calculating the gradient of pixel points; a first range statistics module 133 is coupled to the first gradient calculation module 131 and the first direction calculation module 132, and uses and a first normalization module 134 coupled to the first range statistics module 133 for normalizing the descriptor.

該第一功能描述子模組130的動作原理如下：首先，高斯影像會經過3×3 遮罩（圖未示）的範圍遮罩，將左右、上下的像素值相減，找出該像素點在x 軸方向的差值方程式（14）與y 軸方向的差值方程式（15）。

(14)

(15) The action principle of the first function description sub-module 130 is as follows: First, the Gaussian image will be masked by a 3×3 mask (not shown in the figure), and the pixel values of the left and right and the top and bottom will be subtracted to find the pixel. The difference equation (14) in the x -axis direction and the difference equation (15) in the y -axis direction.

(14)

(15)

Δp 為像素點在x 軸的差值，Δq 為像素點在y 軸的差值。 Δp is the difference value of the pixel point on the x -axis, and Δq is the difference value of the pixel point on the y -axis.

接著利用方程式(14)和方程式(15)計算出像素點的方向與梯度。在[D. G. Lowe, “Distinctive image features from scale-invariant keypoints,”Int. J. Comput. Vis., vol. 60, no. 2, pp. 91–110, 2004]中，使用了方程式(16)、方程式(17)計算像素點的角度及梯度。

(16)

(17) Then, the direction and gradient of the pixel point are calculated by using equation (14) and equation (15). In [DG Lowe, “Distinctive image features from scale-invariant keypoints,” Int. J. Comput. Vis., vol. 60, no. 2, pp. 91–110, 2004], equation (16), Equation (17) calculates the angle and gradient of the pixel point.

(16)

(17)

然而硬體實現這樣的數學會消耗極大的硬體。因此，本發明提出了一套三角法來求出方向與梯度。Implementing such math in hardware, however, is extremely hardware-intensive. Therefore, the present invention proposes a set of trigonometry to find the direction and gradient.

請一併參照圖6(a)及圖6(b)，其中，圖6(a)繪示本發明一較佳實施例之具梯度計算功能之三角方法之示意圖；圖6(b)繪示本發明一較佳實施例之另一具梯度計算功能之三角方法之示意圖。Please refer to FIG. 6( a ) and FIG. 6( b ) together, wherein FIG. 6( a ) shows a schematic diagram of a trigonometric method with gradient calculation function according to a preferred embodiment of the present invention; FIG. 6( b ) shows A schematic diagram of another trigonometric method with gradient calculation function according to a preferred embodiment of the present invention.

如圖所示，我們觀察方程式(17)，可以發現該方程式猶如畢氏定理，把方程式(17)的m 視為直角三角形的斜邊，Δp 與Δq 的差值視為兩短邊，如圖6(a)所示。接著比較Δp 及Δq 何者較大，將大的近似成斜邊(Hypotenuse)m ，如圖6(b)所示。此做法會有誤差，且誤差最大會在Δp 與Δq 相等時產生，而本發明利用查表法將誤差所小，如表1所式。而本發明所計算方向的方式是利用Δp 與Δq 在平面座標上的位址來定義，如表2所示。簡單來說，若Δp 與Δq 均大於0，且Δp 大於Δq 本發明把此定義為第0方向，若Δp 與Δq 均大於0，且Δp 小於Δq 我們把此定義為第1方向，以此類推。As shown in the figure, when we observe equation (17), we can find that this equation is like the Pythagorean theorem, considering m in equation (17) as the hypotenuse of a right triangle, and the difference between Δp and Δq as the two short sides, As shown in Figure 6(a). Next, compare which of Δp and Δq is larger, and approximate the larger one as a hypotenuse (Hypotenuse) m , as shown in Fig. 6(b). There will be errors in this method, and the maximum error will occur when Δp and Δq are equal, and the present invention uses a table look-up method to reduce the error, as shown in Table 1. The method of calculating the direction in the present invention is defined by the addresses of Δp and Δq on the plane coordinates, as shown in Table 2. In short, if both Δp and Δq are greater than 0, and Δp is greater than Δq , the present invention defines this as the 0th direction, if both Δp and Δq are greater than 0, and Δp is less than Δq , we define this as is the first direction, and so on.

表1具有梯度校正的查表 h = |Δp/Δq| 校正係數K 像素梯度m ≦ 0.25 1.00 m =Max (Δp 或Δq ) ×K 0.25 ＜ h ≦ 0.52 1.08 0.52 ＜ h ≦ 0.65 1.17 0.65 ＜ h ≦ 0.75 1.22 0.75 ＜ h ≦ 0.85 1.28 0.85 ＜ h ≦ 0.95 1.35 0.95 ＜ h ≦ 1.05 1.414 1.05 ＜ h ≦ 1.15 1.35 1.15 ＜ h ≦ 1.35 1.28 1.35 ＜ h ≦ 1.50 1.22 Table 1 Look-up table with gradient correction h = |Δp/Δq| Correction factor K pixel gradient m ≦ 0.25 1.00 m = Max (Δ p or Δ q ) × K 0.25 < h ≦ 0.52 1.08 0.52 < h ≦ 0.65 1.17 0.65 < h ≦ 0.75 1.22 0.75 < h ≦ 0.85 1.28 0.85 < h ≦ 0.95 1.35 0.95 < h ≦ 1.05 1.414 1.05 < h ≦ 1.15 1.35 1.15 < h ≦ 1.35 1.28 1.35 < h ≦ 1.50 1.22

表2 角度範圍條件及方向分布 方向（Direction） 角度範圍條件（Angle range condition ） 0 0° ≤ (Δp, Δq) ＜ 45° 1 45° ≤ (Δp, Δq) ＜ 90° 2 90° ≤ (Δp, Δq) ＜ 135° 3 135° ≤ (Δp, Δq) ＜ 180° 4 180° ≤ (Δp, Δq) ＜ 225° 5 225° ≤ (Δp, Δq) ＜ 270° 6 270° ≤ (Δp, Δq) ＜ 315° 7 315° ≤ (Δp, Δq) ＜ 360° Table 2 Angle range conditions and direction distribution Direction Angle range condition 0 0° ≤ (Δp, Δq) ＜ 45° 1 45° ≤ (Δp, Δq) ＜ 90° 2 90° ≤ (Δp, Δq) ＜ 135° 3 135° ≤ (Δp, Δq) ＜ 180° 4 180° ≤ (Δp, Δq) ＜ 225° 5 225° ≤ (Δp, Δq) ＜ 270° 6 270° ≤ (Δp, Δq) ＜ 315° 7 315° ≤ (Δp, Δq) ＜ 360°

計算完每個偵測點的方向與梯度後，接著會進入到第一範圍統計模組133 ，該模組會統計16×16的像素點的方向及梯度。統計前會先將16×16的範圍區分成16個區塊，每個區塊有4×4的像素，並在每個區塊中統計8個方向，統計後會有16個區塊每個區塊8個方向的梯度。此時一共會有128個方向梯度，表示為一個偵測點的特徵描述子。然而128個維度在硬體實現上會消耗大量的資源，因此我們提出了降低維度的方法。觀察圖6(a)，可以發現的0個方向與的4個方向為相反方向；第1個方向與第5個方向一樣也是相反方向，在向量上可以將他們視為反向向量。因此我們將的0個方向與第4個方向相減；第1個方向與第5個方向相減，以此類推，並用有號數表示，如此一來便可以將128維的描述子化簡成64維，並保留了每個方向梯度的特性。After calculating the direction and gradient of each detection point, the first range statistics module 133 will be entered, and the module will calculate the direction and gradient of 16×16 pixels. Before the statistics, the 16×16 range will be divided into 16 blocks, each block has 4×4 pixels, and 8 directions will be counted in each block, after the statistics, there will be 16 blocks each. The gradient of the block in 8 directions. At this time, there will be a total of 128 directional gradients, which are represented as feature descriptors of a detection point. However, 128 dimensions consume a lot of resources in hardware implementation, so we propose a method to reduce the dimension. Looking at Figure 6(a), it can be found that the 0 direction and the 4 directions are opposite directions; the first direction and the fifth direction are also opposite directions, and they can be regarded as opposite vectors in the vector. Therefore, we subtract the 0th direction from the 4th direction; the 1st direction and the 5th direction are subtracted, and so on, and represent them with signed numbers, so that the 128-dimensional descriptor can be simplified into 64 dimensions, and retains the characteristics of each direction gradient.

為了降低該立體功能匹配模組400進行立體匹配時的硬體使用量，必須降低該描述子的位元數，並且保留資料的分佈性。本發明利用正規化（Normalization）將描述子的64維進行正規化。本發明透過正規化的方式將描述子64×13-bit壓縮成64×9-bit。將描述子向量F =(f ₁ ,f ₂ , …,f ₆₄ )透過方程式(18)進行正規化，可得到新的正規化描述子N =(n ₁ ,n ₂ , …,n ₆₄ )。其中S 為原始描述子向量的總和，如方程式(19)所示。由於正規化後的數值非常小，因此正規化後的值必須乘上權重(w )來放大，可得放大後的描述子L =(l ₁ ,l ₂ , …,l ₆₄ )，如方程式(20)所示，本發明的權重(w) 為127。在經過該第一正規化模組134之正規化後，原本64×13=832維的有號數特徵描述子下降至64×9=576維的有號數，大幅降低了硬體需暫存描述子的資源用量。

(18)

(19)

(20) In order to reduce the hardware usage when the stereo matching module 400 performs stereo matching, the number of bits of the descriptor must be reduced, and the distribution of the data must be preserved. The present invention uses normalization to normalize the 64-dimension of the descriptor. The present invention compresses the descriptor 64×13-bit into 64×9-bit through normalization. By normalizing the descriptor vector F = ( f ₁ , f ₂ , ..., f ₆₄ ) through equation (18), a new normalized descriptor N = ( n ₁ , n ₂ , ..., n ₆₄ ) can be obtained. where S is the sum of the original descriptor vectors, as shown in Equation (19). Since the normalized value is very small, the normalized value must be multiplied by the weight ( w ) to enlarge, and the enlarged descriptor L = ( l ₁ , l ₂ , …, l ₆₄ ) can be obtained, such as the equation ( 20), the weight (w) of the present invention is 127. After the normalization by the first normalization module 134, the original 64×13=832-dimensional signed number feature descriptor is reduced to 64×9=576-dimensional signed number, which greatly reduces the need for temporary storage in hardware. Descriptor resource usage.

(18)

(19)

(20)

在硬體實現上，為了要減少硬體使用量，本發明透過位移的方式取代部分乘法運算。由於權重乘上描述子向量除以描述子向量總和會小於0，因此需將權重位移後並且除上描述子向量的總和，確保相除的結果大於0，接著乘上描述子後再進行位移，硬體實現方程式(10)，以完成描述子正規化，如方程式(21)所式。

(21) In terms of hardware implementation, in order to reduce the amount of hardware usage, the present invention replaces part of the multiplication operation by means of displacement. Since the sum of the weight multiplied by the descriptor vector divided by the descriptor vector will be less than 0, it is necessary to shift the weight and divide it by the sum of the descriptor vector to ensure that the result of the division is greater than 0, then multiply the descriptor and then perform the displacement, Equation (10) is implemented in hardware to perform descriptor normalization, as in Equation (21).

(twenty one)

請再參照圖3，本發明之該第二SIFT模組200，其輸入端為一右眼視覺影像，用以進行一右眼的特徵偵測與描述後輸出一右眼影像特徵點，其進一步包括：一第二影像金字塔建構模組210；一第二功能偵測模組220；一第二功能描述子模組230；以及一第二選擇器240。Referring to FIG. 3 again, the input of the second SIFT module 200 of the present invention is a right-eye visual image, which is used for detecting and describing a right-eye feature and then outputting a right-eye image feature point, which further It includes: a second image pyramid construction module 210 ; a second function detection module 220 ; a second function description sub-module 230 ; and a second selector 240 .

該第二影像金字塔建構模組210主要是為了找出特徵偵測時所需要的連續尺度空間差異，它包含一第二高斯影像金字塔211及一第二差分影像金字塔212。其中，該第二高斯影像金字塔211是透過高斯濾波器模組（圖未示），將初始影像模糊化，並消除影像雜訊，確保所獲得的高斯影像在後續的運算不受雜訊的干擾。該第二差分影像金字塔212是將該高斯影像兩兩相減，勾勒出影像輪廓，以保留影像特徵，作為後續特徵點偵測的基礎。其詳情請參照上述該第一影像金字塔建構模組110之相關說明。The second image pyramid construction module 210 is mainly used to find the continuous scale space difference required for feature detection, and includes a second Gaussian image pyramid 211 and a second differential image pyramid 212 . The second Gaussian image pyramid 211 blurs the initial image and eliminates image noise through a Gaussian filter module (not shown), so as to ensure that the obtained Gaussian image is not disturbed by noise in subsequent operations . The second differential image pyramid 212 subtracts the Gaussian images two by two to outline the contour of the image, so as to retain the image features and serve as the basis for subsequent feature point detection. For details, please refer to the above-mentioned description of the first image pyramid construction module 110 .

該第二功能偵測模組220進一步包括：一第二極值偵測模組221、一第二高對比度偵測模組222；一第二角點偵測模組223；一第二一階偏微分矩陣模組；一第二海森矩陣模組；一第二伴隨矩陣模組；以及一第二行列式計算模組（其中，該第二一階偏微分矩陣模組、第二海森矩陣模組、第二伴隨矩陣模組；以及第二行列式計算模組皆圖未示，其詳情請參照與其類似之圖5）。當該第二極值偵測模組221、第二高對比度偵測模組222以及第二角點偵測模組223滿足三種偵測的結果，即表示為特徵點。其中，該第二極值偵測模組221可偵測極大值及極小值。該第二功能偵測模組220之詳情請參照上述該第一功能偵測模組120之說明。The second function detection module 220 further includes: a second extreme value detection module 221, a second high contrast detection module 222; a second corner detection module 223; a second first order Partial differential matrix module; a second Hessian matrix module; a second adjoint matrix module; and a second determinant calculation module (wherein, the second first order partial differential matrix module, the second Hessian The matrix module, the second adjoint matrix module, and the second determinant calculation module are not shown in the figures, please refer to the similar Figure 5 for details). When the second extreme value detection module 221 , the second high contrast detection module 222 and the second corner detection module 223 satisfy the three detection results, it is indicated as a feature point. Wherein, the second extreme value detection module 221 can detect the maximum value and the minimum value. For details of the second function detection module 220, please refer to the description of the first function detection module 120 above.

將該第二極值偵測模組221、第二高對比度偵測模組222以及第二角點偵測模組223偵測的輸出經過一第二及閘229保留本發明需要的特徵點，以改善特徵點不易辨識且易受雜訊干擾的情形。其中，該第二角點偵測模組223在進行平面上的角偵測與高對比度偵測時，需要使用二階偏微分進行運算，且在硬體實現上，本發明使用第二海森矩陣模組來表示二階偏微分，其詳情請參照中華民國發明第I592897號專利說明書，在此不擬重複贅述。其計算高對比度的方式請參照［0043］段中所述。The outputs detected by the second extreme value detection module 221, the second high contrast detection module 222 and the second corner detection module 223 pass through a second gate 229 to retain the feature points required by the present invention, In order to improve the situation that the feature points are not easy to identify and are easily disturbed by noise. Wherein, when the second corner detection module 223 performs corner detection and high contrast detection on a plane, it needs to use second-order partial differential for calculation, and in terms of hardware implementation, the present invention uses a second Hessian matrix The module represents the second-order partial differential. For details, please refer to the Patent Specification No. I592897 of the Republic of China Invention, which will not be repeated here. The way it calculates high contrast is as described in paragraph [0043].

該第二功能描述子模組230是利用影像金字塔輸出的高斯影像，將每個點與周遭點利用範圍統計的方式進行方向及梯度統計。其中，該第二功能描述子模組230進一步包括：一第二梯度計算模組231，耦接至該第二高斯影像金字塔210，用以計算完像素點方向；一第二方向計算模組232，耦接至該第二高斯影像金字塔211，用以計算完像素點梯度；一第二範圍統計模組233，耦接至該第二梯度計算模組231及第二方向計算模組232，用以統計像素點方向及梯度；以及一第二正規化模組234，耦接至該第二範圍統計模組233，用以將該描述子進行標準化。其詳情請參照上述該第二功能描述子模組130之說明。The second function description sub-module 230 uses the Gaussian image output from the image pyramid to perform direction and gradient statistics on each point and surrounding points by means of range statistics. Wherein, the second function description sub-module 230 further includes: a second gradient calculation module 231, coupled to the second Gaussian image pyramid 210, for calculating the pixel point direction; a second direction calculation module 232 , which is coupled to the second Gaussian image pyramid 211 for calculating the gradient of the pixel point; a second range statistics module 233 is coupled to the second gradient calculation module 231 and the second direction calculation module 232, and uses and a second normalization module 234 coupled to the second range statistics module 233 for normalizing the descriptor. For details, please refer to the description of the second function description sub-module 130 above.

請參照圖7，其繪示本發明一較佳實施例之立體功能匹配模組之細部方塊示意圖。Please refer to FIG. 7 , which shows a detailed block diagram of a three-dimensional function matching module according to a preferred embodiment of the present invention.

如圖所示，該立體功能匹配模組400進一步包括：一串列轉並列記憶體410，分別耦接至該第一SIFT模組100及第二SIFT模組200，分別可將複數個暫存器（圖未示），例如但不限於為8個暫存器，左、右影像特徵點與特徵點座標依序存入複數個暫存器中，並同時將暫存器的複數個特徵點資訊輸出，達到串列輸入轉並列輸出的效果；一最小維度計算模組420，耦接至該串列轉並列記憶體410，當右影像特徵點訊號來臨時，會判斷該串列轉並列記憶體410的複數個左影像的y 座標與右影像R的座標比對，是否相等；一匹配模組430，耦接至該最小維度計算模組420，用以找尋該最小維度計算模組420輸出的最小值，最終再判斷輸出後的最小值是否過大，若大於某個閥值，予以剔除；以及一深度計算模組440，耦接至該匹配模組430，用以計算該特徵點在實際中與立體視覺攝影機的距離，也就是深度計算，主要是透過匹配點在影像上的距離與實際的距離進行相似三角形的方式判斷。As shown in the figure, the stereo function matching module 400 further includes: a serial-to-parallel memory 410, which is respectively coupled to the first SIFT module 100 and the second SIFT module 200, and can respectively store a plurality of (not shown in the figure), for example but not limited to 8 registers, the left and right image feature points and the coordinates of the feature points are sequentially stored in the plurality of registers, and the plurality of feature points in the register are stored at the same time. Information output, to achieve the effect of serial input to parallel output; a minimum dimension calculation module 420, coupled to the serial to parallel memory 410, when the right image feature point signal comes, it will determine the serial to parallel memory Comparing the y -coordinates of the plurality of left images of the body 410 with the coordinates of the right image R, whether they are equal; a matching module 430, coupled to the minimum dimension calculation module 420, is used to find the output of the minimum dimension calculation module 420 Finally, it is judged whether the outputted minimum value is too large, and if it is larger than a certain threshold, it will be rejected; and a depth calculation module 440, coupled to the matching module 430, is used to calculate the actual value of the feature point. The distance between the medium and the stereo vision camera, that is, the depth calculation, is mainly judged by the method of similar triangles by matching the distance between the point on the image and the actual distance.

經由本發明立體影像辨識及匹配系統之實施，其具有第一及第二影像金字塔建構模組，係預先以軟體找出複數個不同尺度之高斯模板遮罩參數，再透過複數個高斯濾波器模組平行進行複數個卷積運算，其中各所述卷積運算係依該影像資料與一所述遮罩參數進行，以獲得複數個高斯影像，以克服習知技術在高斯模板運算時使用指數函數所產生的硬體浮點數及耗費大量運算成本之問題；該海森反矩陣模組運算係利用伴隨矩陣的方式，將計算出之伴隨矩陣及行列式之值輸出至低對比度特徵偵測模組，並利用數值推導方式計算以取代複數個除法器之使用；該正規化運算模組係在計算特徵點向量之正規化數值時，乘上一增益值後，使用右移運算，用以大幅減少除法器之使用。藉由減少計算量與增進特徵點匹配正確率之方式，提升系統運算效能，以達到即時立體影像辨識及匹配之目的。因此，確實較習知之影像辨識系統具有進步性。Through the implementation of the stereoscopic image recognition and matching system of the present invention, which has first and second image pyramid construction modules, a plurality of Gaussian template mask parameters of different scales are found in advance by software, and then a plurality of Gaussian filter models are passed through the software. A plurality of convolution operations are performed in parallel, wherein each of the convolution operations is performed according to the image data and a mask parameter to obtain a plurality of Gaussian images, in order to overcome the use of exponential functions in the Gaussian template operation in the prior art The generated hardware floating-point numbers and the problem of consuming a lot of computing costs; the Hessian inverse matrix module operation uses the adjoint matrix to output the calculated adjoint matrix and determinant values to the low-contrast feature detection module. group, and use the numerical derivation method to calculate to replace the use of multiple dividers; the normalization operation module is to multiply a gain value when calculating the normalized value of the feature point vector, and then use the right shift operation to greatly Reduce the use of dividers. By reducing the amount of calculation and improving the accuracy of feature point matching, the system computing performance is improved to achieve the purpose of real-time stereoscopic image recognition and matching. Therefore, it is indeed more advanced than the conventional image recognition system.

本案所揭示者，乃較佳實施例，舉凡局部之變更或修飾而源於本案之技術思想而為熟習該項技藝之人所易於推知者，俱不脫本案之專利權範疇。What is disclosed in this case is a preferred embodiment, and any partial changes or modifications that originate from the technical ideas of this case and are easily inferred by those who are familiar with the art are within the scope of the patent right of this case.

綜上所陳，本案無論就目的、手段與功效，在在顯示其迥異於習知之技術特徵，且其首先發明合於實用，亦在在符合新型之專利要件，懇請　貴審查委員明察，並祈早日賜予專利，俾嘉惠社會，實感德便。To sum up, regardless of the purpose, means and effect of this case, it is showing its technical characteristics that are completely different from the conventional ones, and its first invention is suitable for practical use, and it also meets the requirements of a new type of patent. Granting a patent as soon as possible will benefit the society, and it will be a real sense of virtue.

100:第一SIFT模組 110:第一影像金字塔建構模組 111:第一高斯影像金字塔 112:第一差分影像金字塔 115:遮罩 120:第一功能偵測模組 121:第一極值偵測模組 122:第一高對比度偵測模組 123:第一角點偵測模組 124:第一一階偏微分矩陣模組 125:第一海森矩陣模組 126:第一伴隨矩陣模組 127:第一行列式計算模組 128:遮罩 129:第一及閘 130:第一功能描述子模組 131:第一梯度計算模組 132:第一方向計算模組 133:第一範圍統計模組 134:第一正規化模組 140:第一選擇器 200:第二SIFT模組 210:第二影像金字塔建構模組 211:第二高斯影像金字塔 212:第二差分影像金字塔 220:第二功能偵測模組 221:第二極值偵測模組 222:第二高對比度偵測模組 223:第二角點偵測模組 229:第二及閘 230:第二功能描述子模組 231:第二梯度計算模組 232:第二方向計算模組 233:第二範圍統計模組 234:第二正規化模組 240:第二選擇器 300:座標計算模組 400:立體功能匹配模組 410:串列轉並列記憶體 420:最小維度計算模組 430:匹配模組 440:深度計算模組100: The first SIFT module 110: First Image Pyramid Building Block 111: First Gaussian Image Pyramid 112: First Difference Image Pyramid 115:Mask 120: The first function detection module 121: The first extreme value detection module 122: The first high contrast detection module 123: The first corner detection module 124: The first order partial differential matrix module 125: First Hessian Matrix Module 126: First Companion Matrix Module 127: The first determinant calculation module 128:Mask 129: First and gate 130: The first function description submodule 131: The first gradient calculation module 132: The first direction calculation module 133: The first range statistics module 134: First Normalization Module 140:First selector 200: Second SIFT module 210: Second Image Pyramid Building Block 211: Second Gaussian Image Pyramid 212: Second Difference Image Pyramid 220: Second function detection module 221: The second extreme value detection module 222: The second high contrast detection module 223: The second corner detection module 229: Second and gate 230: Second function description submodule 231: Second gradient calculation module 232: Second direction calculation module 233: Second Range Statistics Module 234: Second Normalization Module 240: Second selector 300: Coordinate calculation module 400: Stereo function matching module 410: Serial to Parallel Memory 420: Minimum dimension calculation module 430: Matching Modules 440: Depth Computing Module

圖1為一示意圖，其繪示本發明一較佳實施例之立體影像辨識及匹配系統之組合示意圖。圖2為一示意圖，其繪示本發明一較佳實施例之第一SIFT模組之細部方塊示意圖示意圖。圖3為一示意圖，其繪示本發明一較佳實施例之第二SIFT模組之細部方塊示意圖示意圖。圖4為一示意圖，其繪示本發明一較佳實施例之第一SIFT模組以平行處理的架構來同時完成高斯及差分影像之示意圖。圖5為一示意圖，其繪示本發明一較佳實施例之第一SIFT模組之第一功能偵測模組之特徵偵測的硬體架構之示意圖。圖6(a)為一示意圖，其繪示本發明一較佳實施例之具梯度計算功能之三角方法之示意圖。圖6(b)為一示意圖，其繪示本發明一較佳實施例之另一具梯度計算功能之三角方法之示意圖。圖7為一示意圖，其繪示本發明一較佳實施例之立體功能匹配模組之細部方塊示意圖。FIG. 1 is a schematic diagram showing a combined schematic diagram of a stereoscopic image recognition and matching system according to a preferred embodiment of the present invention. 2 is a schematic diagram illustrating a detailed block schematic diagram of a first SIFT module according to a preferred embodiment of the present invention. 3 is a schematic diagram showing a detailed block schematic diagram of a second SIFT module according to a preferred embodiment of the present invention. FIG. 4 is a schematic diagram illustrating a first SIFT module in a preferred embodiment of the present invention to simultaneously complete Gaussian and differential images in a parallel processing structure. 5 is a schematic diagram showing a schematic diagram of the hardware structure of the feature detection of the first function detection module of the first SIFT module according to a preferred embodiment of the present invention. FIG. 6( a ) is a schematic diagram showing a schematic diagram of a triangulation method with a gradient calculation function according to a preferred embodiment of the present invention. FIG. 6( b ) is a schematic diagram illustrating another triangulation method with a gradient calculation function according to a preferred embodiment of the present invention. 7 is a schematic diagram showing a detailed block diagram of a three-dimensional function matching module according to a preferred embodiment of the present invention.

100:第一SIFT模組100: The first SIFT module

200:第二SIFT模組200: Second SIFT module

300:座標計算模組300: Coordinate calculation module

400:立體功能匹配模組400: Stereo function matching module

Claims

A stereoscopic image recognition and matching system, comprising: a first SIFT module, the input of which is a left-eye visual image, used for detecting and describing a left-eye feature and outputting a left-eye image feature point, wherein The first SIFT module further includes: a first image pyramid construction module, coupled with the left-eye visual image, which uses software to find a plurality of Gaussian mask parameters of different scales in advance, and then passes through a plurality of Gaussian filters. The processor module performs a plurality of convolution operations in parallel, wherein each of the convolution operations is performed according to the image data and a mask parameter to obtain a plurality of Gaussian images, and then the plurality of Gaussian images are divided into two The two inputs are sent to a differential image module for Gaussian image subtraction; a first function detection module, coupled to the first image pyramid construction module, is the image data output by the first image pyramid construction module Perform extreme value detection; a first functional description sub-module, coupled with the first image pyramid construction module, is used to calculate the descriptor of each pixel, and find out the point of the point through the surrounding points. direction and gradient, and use the method of range statistics to count the direction gradient within the range to establish a 64-dimensional descriptor; and a first selector, the input of which is respectively coupled to the first function detection module and the The first function description sub-module is used to select an output; a second SIFT module, whose input terminal is a right-eye visual image, is used to detect and describe a right-eye feature and output a right-eye image feature point; a coordinate computing module, coupled to the left-eye visual image feature point and the right-eye visual image feature point, for calculating and outputting the image coordinates of the left-eye image feature point and the right-eye image feature point; and a stereo a function matching module, which is respectively coupled to the first SIFT module, the coordinate calculation module and the second SIFT module, and outputs after matching according to the left-eye image feature points, the right-eye image feature points and their image coordinates; wherein , the first function detection module further includes: a first extreme value detection module, coupled to the first Gaussian difference pyramid, through the Gaussian blurred image after subtraction, the high-pass image can be obtained, and then the high-pass image can be obtained. Image detection extreme value features; a first high contrast detection module, coupled to the first Gaussian difference pyramid, further comprising a A first first-order partial differential matrix module, a first Hessian matrix module, a first high-contrast feature detection module, and a first corner detection module, and then the first high-contrast feature is detected The module and the output signal of the first corner detection module perform a sum operation to calculate high-contrast features; a first corner detection module is coupled to the first Gaussian difference pyramid, the first Hessian The output of the matrix module will enter the first corner detection module to calculate the corner features; and a first gate and a gate, the input ends of which are respectively coupled to the first extreme value detection module and the first high contrast The detection module and the first corner detection module can obtain feature points after the first and gate operations.

The stereoscopic image recognition and matching system as described in claim 1, wherein the second SIFT module further comprises: a second image pyramid construction module, coupled to the right-eye visual image, which is pre-searched by software. A plurality of Gaussian template mask parameters of different scales are obtained, and then a plurality of convolution operations are performed in parallel through a plurality of Gaussian filter modules, wherein each of the convolution operations is performed according to the image data and a mask parameter, to obtain a plurality of Gaussian images, and then input the plurality of Gaussian images into a differential image module two by two to perform Gaussian image subtraction; a second function detection module, and the second image pyramid construction model A group of couplings is used to perform extreme value detection on the image data output by the second image pyramid construction module; a second function description sub-module is coupled to the second image pyramid construction module, for each The descriptor of the pixel is calculated, and the direction and gradient of the point are found through the surrounding points, and the directional gradient within the range is counted by means of range statistics to establish a 64-dimensional descriptor; and a second selector , whose input ends are respectively coupled to the second function detection module and the second function description sub-module for selecting an output.

The stereoscopic image recognition and matching system as described in claim 1, wherein the first image pyramid construction module further comprises: a first Gaussian image pyramid, and when establishing the first Gaussian image pyramid, a continuous scale needs to be established first The spatial image of , through the convolution operation between the initial image and the Gaussian template, the Gaussian blurred image can be obtained: and a first Gaussian difference pyramid, coupled to the first Gaussian image pyramid, the continuous Gaussian modulus The pasted images are subtracted to generate a Gaussian difference image, and the Gaussian difference pyramid can be obtained by repeating the action.

The stereoscopic image recognition and matching system as described in claim 1, wherein the first function description sub-module further comprises: a first gradient calculation module, coupled to the first Gaussian image pyramid, for calculating complete pixel point direction; a first direction calculation module, coupled to the first Gaussian image pyramid, for calculating the complete pixel point gradient; a first range statistics module, coupled to the first gradient calculation module and a first direction calculation module for counting the direction and gradient of pixel points; and a first normalization module, coupled to the first range statistics module, for normalizing the descriptor.

The stereoscopic image recognition and matching system as described in claim 2, wherein the second image pyramid construction module further comprises: a second Gaussian blur pyramid, and when establishing the second Gaussian blur pyramid, a continuous scale needs to be established first The spatial image of , through the convolution operation between the initial image and the Gaussian template, the Gaussian blurred image can be obtained; A Gaussian difference image is generated, and the Gaussian difference pyramid can be obtained by repeated actions.

The stereoscopic image recognition and matching system as described in claim 2, wherein the second function detection module further comprises: a second extreme value detection module, coupled to the second Gaussian difference pyramid, through After subtracting the Gaussian blurred image, a high-pass image can be obtained, and then the high-pass image is used to detect extreme value features; a second high-contrast detection module, coupled to the second Gaussian difference pyramid, further includes a second A first-order partial differential matrix module, a second Hessian matrix module, a second high-contrast feature detection module, and a second corner detection module, and then the second high-contrast feature detection module and the second corner detection A sum operation is performed on the output signal of the measuring module to calculate high contrast features; a second corner detection module is coupled to the second Gaussian difference pyramid, and the output of the second Hessian matrix module will enter the first A two-corner point detection module for calculating corner features; and a second gate and a gate, the input terminals of which are respectively coupled to the second extreme value detection module, the second high-contrast detection module and the second corner point The detection module can obtain feature points after the second and gate operations.

The stereoscopic image recognition and matching system as described in claim 2, wherein the second function description sub-module further comprises: a second gradient calculation module, coupled to the second Gaussian blur pyramid, for calculating complete pixel point direction; a second direction calculation module, coupled to the second Gaussian blur pyramid, for calculating the pixel point gradient; a second range statistics module, coupled to the second gradient calculation module and The second direction calculation module is used to count the direction and gradient of the pixel points; and a second normalization module is coupled to the second range statistics module and used to normalize the descriptor.

The stereoscopic image recognition and matching system as described in claim 1, wherein the stereo function matching module further comprises: a serial-to-parallel memory, respectively coupled to the first SIFT module and the second SIFT module group, respectively, can store a plurality of left and right image feature points and feature point coordinates in a plurality of registers in sequence, and output the information of a plurality of feature points in the register at the same time, so as to achieve serial input to parallel output. Effect: a minimum dimension calculation module, coupled to the serial-to-parallel memory, is used to determine the y -coordinate and the right-hand side of a plurality of left images in the serial-to-parallel memory when the right image feature point signal comes. The coordinates of the images R are compared to see if they are equal; a matching module, coupled to the minimum dimension calculation module, is used to find the minimum value output by the matching module, and to determine the x -axis coordinates of the feature points of the two images after searching Whether the difference is too large, if it is greater than a certain threshold, it will be eliminated; and a depth calculation module, coupled to the matching module, is used to calculate the actual distance between the feature point and the stereo vision camera, that is, depth calculation , which is mainly judged by matching the distance between the points on the image and the actual distance of similar triangles.