TW201331855A

TW201331855A - High-speed hardware-based back-propagation feedback type artificial neural network with free feedback nodes

Info

Publication number: TW201331855A
Application number: TW101102121A
Authority: TW
Inventors: Meng-Shen Cai; Ya-Yu Zhan; yu-xiang Liao
Original assignee: Univ Nat Taipei Technology
Priority date: 2012-01-19
Filing date: 2012-01-19
Publication date: 2013-08-01

Abstract

The present invention discloses a high-speed hardware-based back-propagation feedback type artificial neural network with free feedback nodes, which has a learning mechanism, and its use of cyclic serial architecture is suitable for implementing the parallel computing features of artificial neural networks, calculations of all computing units in a same layer are performed simultaneously by means of single- instruction multiple-bus. Through an architecture design of segmented back-propagation network computing technology, different combinations of network sizes can be obtained under a fixed hardware resource and sufficient memory capacity; it not only can implement three- and four-layer back-propagation artificial neural network, but also can add feedback layers to the feedback-type neural network, and utilize multiplexers to control nodes of the feedback layer so that the network can freely choose the nodes for feedback, the activation function between layers can choose from sigmoid functions, hyperbolic tangent functions and linear functions, so that the network can be configured into different types of feedback and feed-forward type artificial neural network to further broaden the range of hardware applications.

Description

High-speed hardware back-transfer and feedback-like neural network with free feedback nodes

本發明是有關於一種人工類神經網路，特別是有關於一種利用多工器控制回饋層的節點使網路可以自由的選擇要回饋的節點，層級之間的活化函數也有雙彎曲函數、雙曲線正切函數和線性函數也可以選擇，以使網路可以架構出不同類型的回饋型和前饋式類神經網路，使硬體的應用範圍更加廣泛性之具自由回饋節點的高速硬體倒傳遞及回饋型類神經網路。The invention relates to an artificial neural network, in particular to a node for controlling a feedback layer by using a multiplexer, so that the network can freely select a node to be fed back, and the activation function between the levels also has a double bending function, double The curve tangent function and the linear function can also be selected so that the network can construct different types of feedback-type and feed-forward neural networks, so that the hardware application range is more extensive and the high-speed hardware of the free feedback node is inverted. Transfer and feedback type neural networks.

先前技術一，請參閱圖一，D. Hammerstrom提出X1 Array架構，又稱為CNAPS(Connected Network of Adapted Processors System)，CNAPS的優點在於每個運算節點內建了加法與乘法器，且可藉由指令匯流排輸入指令控制其運算，每個節點相當於一個簡單的算數單元(Arithmetic Unit)，所以在計算不同類神經網路的演算法有很大的彈性。應用該架構於類神經網路的硬體開發上，由於運算節點使用相同架構，可以很容易的進行節點數量的調整。很可惜的，該架構使用的控制指令十分繁複，以致於需搭配軟體進行指令的編譯，也沒有專屬於活化函數運算的硬體架構，並因採用了通用性的架構，因而在運算速度上有所犧牲。Prior to Technology 1, please refer to Figure 1. D. Hammerstrom proposed the X1 Array architecture, also known as CNAPS (Connected Network of Adapted Processors System). The advantage of CNAPS is that each operation node has built-in addition and multiplier, and can be used by The instruction bus input instruction controls its operation. Each node is equivalent to a simple arithmetic unit (Arithmetic Unit), so the algorithm for computing different types of neural networks is very flexible. Applying this architecture to the hardware development of neural networks, the number of nodes can be easily adjusted because the computing nodes use the same architecture. Unfortunately, the control instructions used in this architecture are so complicated that it is necessary to compile the instructions with the software, and there is no hardware architecture dedicated to the activation function. Because of the universal architecture, there is a speed of operation. Sacrifice.

先前技術二，US005087826A提出的架構對於每個神經元的鏈結皆使用一個乘法器運算輸入與鍵值的乘積(x‧w)，形成一個二維的陣列架構，運算速度雖快但耗費大量的硬體成本且產生大量的匯流排，不利於設計。Prior art 2, the architecture proposed by US005087826A uses a multiplier operation input product and key value ( x‧w ) for each neuron chain to form a two-dimensional array architecture, which is fast but expensive. Hardware costs and a large number of bus bars are not conducive to design.

US005091864提出的架構雖沒有CNAPS那樣彈性，但其架構精簡且更容易設計及使用，不僅提升了運算的速度也降低了開發的成本，同時也簡化了控制單元的複雜程度，縮短開發的週期。其輸入資料是採用串接的方式傳遞，亦即資料會先傳入第一個運算單元，經過兩個週期後才會傳入第二個運算單元，但因為資料到達各個處理單元的時間不同，也增添了控制單元設計的難度。其中較可取的部分是精簡活化函數的數量。此架構考量到一維陣列的特性，資料是配合時脈週期一筆筆的進行輸入與輸出，因此運算單元並不需要同時進行活化函數的計算，進而將活化函數從神經元中取出，獨立置放於陣列的回傳部分，僅需設計一個活化函數即可完成運算，也不會因此耽誤到運算的速度。此外該架構設計了一組移位暫存器，可以將運算完畢的資料先儲存，再一筆筆往回傳遞，傳遞的同時所有運算單元可立即進行下一筆資料的運算，充分的節省時間。該架構的缺點是沒有學習部分的機制，僅能針對訓練完成的網路進行回想運算。Although the architecture proposed by US005091864 is not as flexible as CNAPS, its architecture is simpler and easier to design and use, which not only improves the speed of calculation but also reduces the development cost, and also simplifies the complexity of the control unit and shortens the development cycle. The input data is transmitted in a serial connection, that is, the data is first transmitted to the first operation unit, and the second operation unit is passed after two cycles, but because the data arrives at each processing unit at different times, It also adds to the difficulty of designing the control unit. The preferred part of this is the number of reduced activation functions. This architecture considers the characteristics of the one-dimensional array. The data is input and output with the pen pulse cycle. Therefore, the operation unit does not need to perform the calculation of the activation function at the same time, and then the activation function is taken out from the neuron and placed independently. In the backhaul part of the array, only one activation function needs to be designed to complete the operation, and it will not delay the operation. In addition, the architecture is designed with a set of shift registers, which can store the calculated data first and then transfer it back and forth. All the arithmetic units can immediately perform the next data calculation, which saves time. The disadvantage of this architecture is that there is no mechanism for learning parts, and only the retrieving operation can be performed for the network that is trained to complete.

US005799134提出的架構與US005091864的概念相似，但輸入資料是採用並聯方式連接，亦即所有運算單元於同一時刻接收到相同的輸入訊號。並藉由在運算單元中增添減法器及多工器使得運算的變化更加靈活。但同樣沒有學習部分的機制，僅能執行類神經網路的回想功能。The architecture proposed by US005799134 is similar to the concept of US005091864, but the input data is connected in parallel, that is, all the arithmetic units receive the same input signal at the same time. And by adding a subtractor and a multiplexer to the arithmetic unit, the change of the operation is more flexible. But there is also no mechanism for learning part, only the echo function of the neural network can be performed.

WO 2008/067676 A1提出可調整的人工類神經網路系統與架構。該系統透過分段的方式進行人工類神經網路的前向與逆向運算，且分段的方式可以依據需求予以調整。惟該系統的硬體須將實際所使用的硬體晶片、分段數目以及邏輯元件使用限制輸入到軟體程式中，再由軟體程式所產生所需要的硬體描述語言程式碼，經編譯後，再下載到硬體晶片中。因此一旦晶片規劃後，其分段數目無法變更。WO 2008/067676 A1 proposes an adjustable artificial neural network system and architecture. The system performs forward and reverse operations of the artificial neural network through segmentation, and the segmentation method can be adjusted according to requirements. However, the hardware of the system must input the actual hardware chip, the number of segments, and the logic component usage limit into the software program, and then the hardware description program generates the required hardware description language code. Then download it to the hardware chip. Therefore, once the wafer is planned, the number of segments cannot be changed.

WO 2010/106116提出奈米裝置、CMOS電路和縱列排列的導體組成的類比式人工類神經網路電路，此種架構的芯片面積小可以合出高密度的複雜電路，而且低功率消耗。但該系統無分段的概念，建構不同節點的類神經網路需要重新設計系統的架構。WO 2010/106116 proposes an analog type artificial neural network circuit composed of a nano device, a CMOS circuit and a column-arranged conductor. The small chip area of the architecture can combine high-density complex circuits with low power consumption. However, the system has no concept of segmentation. Building a neural network of different nodes requires redesigning the architecture of the system.

由此可見，上述習用物品仍有諸多缺失，實非一良善之設計者，而亟待加以改良。It can be seen that there are still many shortcomings in the above-mentioned household items, which is not a good designer and needs to be improved.

本案發明人鑑於上述習用技術所衍生的各項缺點，乃亟思加以改良創新，並經多年苦心孤詣潛心研究後，終於成功研發完成本件具自由回饋節點的高速硬體倒傳遞及回饋型類神經網路。In view of the shortcomings derived from the above-mentioned conventional technologies, the inventor of the present invention has improved and innovated, and after years of painstaking research, he finally successfully developed and completed this high-speed hardware reverse transfer and feedback type neural network with free feedback nodes. road.

利用多工器控制回饋層的節點使網路可以自由的選擇要回饋的節點，層級之間的活化函數也有雙彎曲函數、雙曲線正切函數和線性函數也可以選擇，以使網路可以架構出不同類型的回饋型和前饋式類神經網路為本發明的重點。Using the multiplexer to control the nodes of the feedback layer allows the network to freely select the nodes to be fed back. The activation function between the levels also has a double bending function, a hyperbolic tangent function and a linear function, so that the network can be constructed. Different types of feedback and feedforward neural networks are the focus of the present invention.

3.1　硬體架構與原理3.1 Hardware Architecture and Principle

3.1.1　前向運算硬體架構3.1.1 Forward computing hardware architecture

回饋型類神經網路和倒傳遞類神經網路皆為多層感知器架構，每一層神經元的運算皆有層級。硬體架構如果以層級的觀念設計，則網路大小會和消耗的硬體成本及邏輯元件成正比。因此可將前一層的資料先用記憶體儲存，等到需要時再透過輸入匯流排傳送。後層運算的結果也用記憶體儲存，在下一層運算時，再從記憶體讀取放置於層級輸入匯流排上。如此的作法可以減少硬體的使用量。Both the feedback type neural network and the inverse transfer type neural network are multi-layer perceptron architectures, and each layer of neurons has a hierarchy of operations. If the hardware architecture is designed with a hierarchical concept, the network size will be proportional to the hardware cost and logic components consumed. Therefore, the data of the previous layer can be stored in the memory first, and then transmitted through the input bus when needed. The result of the later layer operation is also stored in the memory, and is read from the memory and placed on the hierarchical input busbar in the next layer of operation. This approach can reduce the amount of hardware used.

本發明採用環狀串列架構為基礎。此硬體以單層的網路架構，分段處理輸入層到隱藏層的計算與隱藏層到輸出層的計算，以完成一次前向運算。此種設計可以降低成本。但是前向運算是由許多的運算單元組成的，在實際硬體的邏輯元件限制下會影響所能建立的網路大小，降低了此架構的實用性。所以本發明另採用分段計算的方式計算前向運算，分段計算經由控制器的設計，將運算單元個數固定並善用記憶體的容量，讓系統可以以有限的運算單元個數完成較大的類神經網路。The invention is based on a ring-and-column architecture. This hardware uses a single-layer network architecture to segment the calculation of the input layer to the hidden layer and the calculation of the hidden layer to the output layer to complete a forward operation. This design can reduce costs. However, the forward operation is composed of many arithmetic units. Under the constraints of the actual hardware logic components, it will affect the size of the network that can be established, which reduces the practicability of this architecture. Therefore, the present invention further calculates the forward operation by using the segmentation calculation method, and the segmentation calculation is performed by the controller, and the number of the operation units is fixed and the capacity of the memory is utilized, so that the system can complete the calculation with a limited number of operation units. Large neural network.

請參閱圖二，為前向運算架構圖，硬體初始化後開始網路的前向運算，輸入層資料會從輸入匯流排傳至下一層的運算單元裡，與運算單元記憶體裡的權重值進行乘加運算，運算的結果再由活化函數區塊計算，活化函數區塊會輸出活化函數值及其微分值，並由暫存器存取。接著將移位訊號(shift)致能，將各個運算單元運算的結果透過匯流排傳至下一個運算單元，直到最後一個運算單元輸出每一個運算單元的結果給外部記憶體儲存。外部記憶體儲存的值可以當作下一層級的輸入，經由資料匯流排，重複進行上述的動作計算下一層級的輸出。如此設計可以僅靠一層的網路架構完成多層的前向運算。Please refer to Figure 2, which is the forward operation architecture diagram. After the hardware is initialized, the forward operation of the network is started. The input layer data is transferred from the input bus to the operation unit of the next layer, and the weight value in the memory of the operation unit. The multiply-add operation is performed, and the result of the operation is further calculated by the activation function block, and the activation function block outputs the activation function value and its differential value, and is accessed by the register. Then, the shift signal is enabled, and the result of each operation unit operation is transmitted to the next operation unit through the bus bar until the last operation unit outputs the result of each operation unit to the external memory. The value stored in the external memory can be used as the input of the next level, and the above-mentioned action is repeated through the data bus to calculate the output of the next level. This design allows multiple layers of forward operations to be performed on a single layer of network architecture.

當預設的運算單元數目不夠建立網路的大小，本發明採用分段計算的方式以合成更大的網路。請參閱圖三為分段計算流程圖，假設硬體架構預設的最大處理單元為2個，但需要運算的神經元數目為5個，則環狀串列架構會先計算前面兩個神經元的輸出值並存入記憶體再繼續運算其餘的神經元。When the number of preset arithmetic units is insufficient to establish the size of the network, the present invention adopts a segmentation calculation method to synthesize a larger network. Please refer to Figure 3 for the segmentation calculation flow chart. Assuming that the maximum processing unit of the hardware architecture is two, but the number of neurons that need to be operated is five, the ring string architecture will calculate the first two neurons first. The output value is stored in memory and the remaining neurons are continued.

3.1.2　逆向運算硬體架構3.1.2 Reverse Computing Hardware Architecture

請參閱圖四，為逆向逆向運算架構圖，硬體架構主要計算倒傳遞演算法中的逆向運算流程，其硬體架構分為四大區塊，分別為計算輸出層的δ區塊、計算隱藏層的δ區塊、計算Δw區塊以及更新權重的硬體區塊。當逆向運算開始時，系統會先將目標值和經由前向運算得到的輸出值與微分值傳至計算輸出層的δ區塊。此區塊按照倒傳遞學習法計算輸出層的δ方法設計，目標值與前向運算的結果相減，再與微分值相乘，則可以得到輸出層的誤差δ。Please refer to Figure 4, which is the reverse reverse operation architecture diagram. The hardware architecture mainly calculates the reverse operation flow in the inverse transfer algorithm. The hardware architecture is divided into four blocks, which are respectively calculated for the δ block of the output layer and calculated hidden. The delta block of the layer, the Δw block is calculated, and the hardware block with the updated weight is calculated. When the reverse operation starts, the system first passes the target value and the output value and the differential value obtained through the forward operation to the δ block of the calculation output layer. This block is designed according to the inverse transfer learning method to calculate the δ method of the output layer. The target value is subtracted from the result of the forward operation, and then multiplied by the differential value, the error δ of the output layer can be obtained.

等待輸出層的誤差δ計算完畢，將繼續隱藏層誤差的運算。隱藏層的誤差運算式子中，具有與前向運算概念相似的乘加運算，不一樣的部份是不需使用活化函數的計算以及神經元的偏權值。本發明參考先前技術，重複利用前向運算的環狀串列架構與平行處理的方式，並排除正向架構中活化函數計算的部份。將輸出層的δ與處理單元的權重透過環狀串列架構相乘和累加，其結果與前向運算中的隱藏層神經元的微分值傳至隱藏層的δ區塊。此區塊只需要一個乘法器，計算兩者的乘積就可得到隱藏層的誤差。同理，四層的倒傳遞類神經網路，隱藏層的誤差運算方式相同，可利用同樣的硬體架構和流程計算第二和第一隱藏層之間的神經元誤差。Waiting for the error δ of the output layer to be calculated, the operation of hiding the layer error will continue. In the error expression of the hidden layer, there is a multiplication and addition operation similar to the concept of the forward operation. The difference is that the calculation of the activation function and the bias value of the neuron are not required. The present invention refers to the prior art, reusing the loop-like architecture and parallel processing of the forward operation, and excluding the portion of the activation function calculation in the forward architecture. The δ of the output layer and the weight of the processing unit are multiplied and accumulated by the ring string architecture, and the result is transmitted to the δ block of the hidden layer with the differential value of the hidden layer neuron in the forward operation. This block only needs a multiplier, and the product of the two can be used to obtain the error of the hidden layer. Similarly, the four-layer inverted-transitive neural network has the same error calculation method for the hidden layer, and the same hardware architecture and process can be used to calculate the neuron error between the second and first hidden layers.

輸出層和隱藏層的誤差都運算完畢後，Δw計算區塊接收系統管理單元的指令開始計算權重值的修正量。Δw計算區塊的輸入包含δ記憶體儲存的各層的差距量、學習速率以及輸入匯流排。經由輸入匯流排傳輸的資料包括各神經元的偏權值和前一層級的輸出值。Δw計算區塊中的輸入匯流排會在每次讀取δ記憶體中的值時，先將偏權值傳送至Δw計算區塊的輸入匯流排計算偏權值的修正量。接著再將前一層級的輸出值經由輸入匯流排傳給Δw計算區塊運算偏權值的修正量。Δw計算區塊會先算出輸出層和隱藏層之間的權重修正量。計算的結果最後與前一次疊代的權重值一起傳至更新權重的硬體區塊，在更新權重的硬體區塊兩者經由加法器的相加得到更新後的數值，完成一次逆向運算。After the errors of the output layer and the hidden layer are all calculated, the instruction of the Δw calculation block receiving system management unit starts to calculate the correction amount of the weight value. The input to the Δw calculation block contains the amount of gaps, learning rates, and input bus bars for each layer of delta memory storage. The data transmitted via the input bus includes the bias value of each neuron and the output value of the previous level. The input bus in the Δw calculation block will transfer the offset value to the input bus of the Δw calculation block to calculate the correction amount of the partial weight each time the value in the δ memory is read. Then, the output value of the previous level is transmitted to the Δw calculation block via the input bus to calculate the correction amount of the partial weight. The Δw calculation block first calculates the weight correction amount between the output layer and the hidden layer. The result of the calculation is finally transmitted to the hardware block of the update weight together with the weight value of the previous iteration, and the updated value is obtained by adding the added weights of the hardware block of the update weight to complete the reverse operation.

3.1.3　運算單元3.1.3 arithmetic unit

請參閱圖五，為運算單元內部硬體架構圖。本發明的環狀串列架構是由許多運算單元串列而成的。運算單元的功能有如類神經網路中的神經元。運算單元經由輸入匯流排和權重匯流排得到輸入值和權重值，計算兩者的乘法累加值並透過活化函數區塊做非線性的轉換。運算單元內部架構主要為記憶體、乘法累加器、移位暫存器以及活化函數等四個區塊。這四個區塊皆為獨立運作，並採用管線設計的方式加快整體運算單元計算的效能。Please refer to Figure 5 for the internal hardware architecture diagram of the unit. The ring string architecture of the present invention is formed by a series of arithmetic units. The function of the arithmetic unit is like a neuron in a neural network. The arithmetic unit obtains the input value and the weight value via the input bus and the weight bus, calculates the multiplicative accumulated value of the two, and performs nonlinear conversion through the activation function block. The internal structure of the arithmetic unit is mainly four blocks of memory, multiply accumulator, shift register and activation function. These four blocks operate independently and use pipeline design to speed up the computation of the overall computing unit.

運算單元內部的佇列記憶體是用來儲存加權計算時所需要的權重值。網路的正向運算和逆向運算讀取的權重排列順序不同，因此需要由不同的記憶體儲存正、逆向運算時的權重指標。為了讓運算單元裡的權重值以正確的順序讀出，在系統初始化時，權重值的記憶體位址需要權重記憶體位址管理單元針對正向運算、逆向運算及更新權重三種情況進行排列。每個運算單元都有自己的定址編號，用來引導權重匯流排的資料儲存在正確的運算單元裡。即運算單元的記憶體寫入訊號時需與地址匯流排上的訊號搭配。而運算單元讀取記憶體裡的權重值時，會將權重值傳給內部的乘法累加器。因為所有的運算單元會同時從輸入匯流排接收輸入數據進行乘法累加的計算，所以不需要進行定址的分析。乘法累加器負責輸入數據和權重值的相乘和相加。正向運算時需要透過活化函數轉換輸入訊息。運算單元裡的活化函數硬體區塊分為雙彎曲函數及其微分的區塊以及雙ˊ曲線正切函數及其微分的區塊。乘法累加器運算的結果會傳至活化函數區塊進行轉換。The queue memory inside the operation unit is used to store the weight value required for the weight calculation. The weighting order of the forward and reverse operations of the network is different, so it is necessary to store the weight indicators in the forward and reverse operations by different memories. In order to read the weight values in the arithmetic unit in the correct order, at the time of system initialization, the memory address of the weight value needs to be arranged in the weight memory address management unit for the forward operation, the reverse operation, and the update weight. Each unit has its own addressing number, which is used to store the data of the weight bus in the correct unit. That is, the memory of the arithmetic unit needs to be matched with the signal on the address bus when writing the signal. When the arithmetic unit reads the weight value in the memory, the weight value is passed to the internal multiply accumulator. Since all the arithmetic units receive input data from the input bus at the same time for multiplication and accumulation calculation, there is no need to perform address analysis. The multiply accumulator is responsible for the multiplication and addition of the input data and the weight value. In the forward operation, the input message needs to be converted by the activation function. The activation function hardware block in the arithmetic unit is divided into a double bending function and its differential block, and a double ˊ curve tangent function and its differential block. The result of the multiply accumulator operation is passed to the activation function block for conversion.

每個運算單元內部具有一條移位的控制訊號，此訊號控制是否將目前運算單元計算的結果輸出到下一個運算單元。當資料在運算單元計算完畢後，各個運算單元計算的結果則利用移位的方式，將結果傳給下一個運算單元。最後一個運算單元將結果經由匯流排傳給記憶體儲存。Each arithmetic unit has a shift control signal inside, and this signal controls whether the result calculated by the current operation unit is output to the next operation unit. After the data is calculated by the arithmetic unit, the results calculated by each arithmetic unit are transferred to the next arithmetic unit by means of shifting. The last arithmetic unit passes the result to the memory storage via the bus.

3.1.4　數值系統3.1.4 Numerical system

請參閱圖六，為定點小數表示圖，本發明的資料運算及處理都使用二進位的訊號來實現。數值的運算，先前技術提出有限精度的整數與浮點數進行誤差比對，兩者訓練出的誤差差異不大。而先前技術提出使用整數運算，相對浮點數運算可以減少運算的時間而且也不需要太多的有效位元數。基於上述的理論與邏輯閘和匯流排等硬體資源的考量，本發明採用32bits的二進位碼定點有號小數作為本架構的數值編碼方式。但是Nios II微處理器小數型態是採用IEEE 754格式的浮點小數，因此在硬體接收Nios II下達的指令以及網路參數時，必須先由浮點轉定點轉換器進行數值轉換。本發明的整數位數與小數位數可以依照使用者進行變更。Referring to FIG. 6 , for the fixed-point decimal representation, the data calculation and processing of the present invention are implemented by using binary signals. For the numerical operation, the prior art proposes an error comparison between an integer with a finite precision and a floating point number, and the difference between the two training errors is not large. While the prior art proposes to use integer operations, relative floating point operations can reduce the computation time and do not require too many valid bit numbers. Based on the above theoretical considerations and hardware resources such as logic gates and bus bars, the present invention uses a 32-bit binary code fixed-point numbered decimal number as the numerical encoding method of the architecture. However, the Nios II microprocessor decimal type uses the floating point fraction of the IEEE 754 format. Therefore, when the hardware receives the instructions and network parameters issued by the Nios II, it must first be converted by the floating point to fixed point converter. The integer number of digits and the number of decimal places of the present invention can be changed in accordance with the user.

3.1.5　堆疊、佇列與隨機位置存取記憶體3.1.5 Stacking, Queue, and Random Location Access Memory

系統在運算的過程中需要記憶體存取資料，以紀錄網路每一層需要的輸入值和輸出值。本發明採用堆疊(Stack)、佇列(FIFO)以及隨機存取記憶體等三種不同的硬體架構來儲存資料。堆疊與佇列在儲存時不需要知道記憶體的位置，可以減輕設計上的負擔，但是使用上較無彈性可言。資料按順序儲存，也只能以特定的順序讀取。而隨機存取記憶體需要搭配記憶體位置以進行儲存或是讀取，提供讀取記憶體的位址即可讀取。表一記載本發明設置的記憶體功能。堆疊與佇列硬體架構訊號功能說明如下：The system needs memory access data during the operation to record the input and output values required by each layer of the network. The invention uses three different hardware architectures, such as stack, queue (FIFO) and random access memory, to store data. Stacking and stacking do not need to know the location of the memory when storing, which can reduce the design burden, but it is less flexible in use. The data is stored in order and can only be read in a specific order. The random access memory needs to be matched with the memory location for storage or reading, and the address of the read memory can be read. Table 1 describes the memory functions provided by the present invention. The stacking and array hardware architecture signal functions are described as follows:

clear：在堆疊架構中表示寫入位置回到最低位置，讀取位置回到最高位置；在佇列架構中則表示寫入與讀取位置都回到最低位置。Clear: In the stack architecture, the write position returns to the lowest position, and the read position returns to the highest position. In the queue structure, both the write and read positions return to the lowest position.

hold：在堆疊與佇列架構中表示記錄目前記憶體位置。Hold: Indicates the current memory location in the stack and queue architecture.

restart：此訊號需搭配hold訊號，在堆疊與佇列架構中表示將讀取位置回到先前hold所設定的位置。Restart: This signal needs to be matched with the hold signal. In the stack and queue structure, the read position is returned to the position set by the previous hold.

rd_addr：讀取資料的記憶體位置Rd_addr: the memory location of the read data

3.1.6　活化函數硬體架構3.1.6 Activation Function Hardware Architecture

活化函數以硬體架構實現，文獻中常見的方法有查表法、查表截線內插法及片段線性法。查表法用來實現活化函數硬體架構是最常用的方法之一，先前技術中提到查表法只需要經過三層的硬體運算就能得到活化函數的數據。查表法的優點，設計簡單、運算速度快。但缺點是花費的硬體成本跟精確度成正比，當活化函數的轉換需要更精確的數據時，硬體需要更多的記憶體儲存。再加上比對更多的資料也會更花費執行的時間。查表截線內插法，則是以一直線切割曲線，切割的長度隨著轉折點作變化。當轉折點與上一個轉折點之間的歐式距離小於指定的誤差時稱為最佳轉折點。其示意圖，請參閱圖七，為傳統內插法與截線內插法示意圖。片段線性法[48]採用Center Linear Approximation(CRI)演算法算出雙彎曲函數曲線的近似值，其演算法如表二所示。表二的△代表內插的深度(Interpolation Depth)，x代表輸入值，q代表疊代的次數。先前技術指出當q為2、△設定為0.28094時具有最佳解。此演算法只能算出雙彎曲函數x≦0的部份，另外一半的部份可以藉由函數對稱的性質來實現，此方法可以減少邏輯元件的使用量。其硬體架構圖如圖八所示。片段線性法的原理為先找尋轉折點，接著誤差會慢慢增加，當誤差到達一個極限時，會找尋新的轉折點。圖九是片段線性法的示意圖，從圖中可以發現片段線性法是運用內插法的方式找出雙彎曲函數的值。The activation function is implemented in a hardware architecture. The common methods in the literature are look-up table method, table-cutting interpolation method and fragment linear method. The look-up table method is one of the most commonly used methods to implement the activation function hardware architecture. In the prior art, the table look-up method only needs three layers of hardware operations to obtain the activation function data. The advantages of the look-up table method are simple design and fast calculation speed. The disadvantage is that the hardware cost of the cost is proportional to the accuracy. When the conversion of the activation function requires more accurate data, the hardware needs more memory storage. Adding more information will also take more time to execute. The table cut line interpolation method is a straight line cutting curve, and the length of the cut varies with the turning point. The best turning point is called when the Euclidean distance between the turning point and the previous turning point is less than the specified error. For a schematic diagram, please refer to FIG. 7 , which is a schematic diagram of a conventional interpolation method and a line interpolation method. The segment linear method [48] uses the Center Linear Approximation (CRI) algorithm to calculate the approximation of the double bending function curve. The algorithm is shown in Table 2. The Δ of Table 2 represents the interpolation depth (Interpolation Depth), x represents the input value, and q represents the number of iterations. The prior art indicates that there is an optimum solution when q is 2 and Δ is set to 0.28094. This algorithm can only calculate the part of the double-bend function x≦0, and the other half can be realized by the property of the function symmetry. This method can reduce the usage of logic elements. Its hardware architecture diagram is shown in Figure 8. The principle of the fragment linear method is to find the turning point first, then the error will gradually increase. When the error reaches a limit, it will find a new turning point. Figure 9 is a schematic diagram of the segment linear method. It can be seen from the figure that the segment linear method uses the interpolation method to find the value of the double bending function.

Table 2 Fragment linear algorithm

3.1.7　雙彎曲函數硬體架構3.1.7 Double Bend Function Hardware Architecture

本發明採用先前技術的活化函數硬體架構計算活化函數值。先前技術修改片段線性法的架構，利用硬體架構可重複利用的特性減少硬體架構需要的資源。以表二片段線性演算法為例，在i=0的第一次計算時，g(x)初始值為0。方程式(3.2)用硬體實現需要一個加法器及兩個移位暫存器，方程式(3.4)則需要3個加法器和3個移位暫存器。將方程式(3.1)改成方程式(3.2)，提出四分之一等到方程式(3.3)再做計算。如此的作法可以得到相同的結果，但是在邏輯元件使用方面，原本的方程式(3.1)節省了兩個移位暫存器。而因為g(x)初始值等於0。所以在方程式(3.3)時可以直接省略加0的設計，代入方程式(3.2)後轉變成方程式(3.4)。相對原本的設計可以多省下一個加法器，也可以加快運算所需要的時脈。The present invention calculates activation function values using prior art activation function hardware architectures. The prior art modifies the architecture of the fragment linear method, utilizing the hardware architecture reusable features to reduce the resources required by the hardware architecture. Taking the table 2 segment linear algorithm as an example, the initial value of g(x) is 0 in the first calculation of i =0. Equation (3.2) requires an adder and two shift registers for hardware implementation, and three adders and three shift registers for equation (3.4). Change equation (3.1) to equation (3.2), propose a quarter to equation (3.3) and then do the calculation. This approach can achieve the same result, but in terms of the use of logic components, the original equation (3.1) saves two shift registers. And because the initial value of g(x) is equal to zero. Therefore, in equation (3.3), the design with 0 added can be omitted directly, and the equation (3.2) is substituted into equation (3.4). Compared with the original design, you can save an additional adder and speed up the clock required for the operation.

表3.2的演算法，疊代次數q的值越大，得到的結果越接近雙彎曲函數的值，但運算時間也越久。實驗的結果疊代兩次對於網路整體的效益是最好的次數。因此在疊代次數固定為2的情況下，將表3.2的方程式(3.5)改為一儲存常數的暫存器，減少方程式(3.5)需要的硬體架構。取而代之的是在第二次疊代時方程式(3.3)改變成方程式(3.6)，原本Δ移位2位元的運算則省略。在第一次和第二次疊代時，改由Δ₁=0.979721、Δ₂=0.197876的常數暫存器跟方程式(3.4)和方程式(3.6)進行加法運算。此硬體架構設計的結果如圖十所示。In the algorithm of Table 3.2, the larger the value of the iteration number q, the closer the result is to the value of the double bending function, but the longer the operation time. The result of the experiment is the best number of times for the overall benefit of the network. Therefore, in the case where the number of iterations is fixed at 2, the equation (3.5) of Table 3.2 is changed to a register storing a constant, reducing the hardware architecture required by equation (3.5). Instead, equation (3.3) is changed to equation (3.6) in the second iteration, and the original Δ shifting of 2 bits is omitted. In the first and second iterations, the constant register is changed by Δ ₁ = 0.979721, Δ ₂ = 0.197876, and the equation (3.4) and equation (3.6) are added. The result of this hardware architecture design is shown in Figure 10.

3.1.8　雙曲線正切函數硬體架構3.1.8 Hyperbolic Tangent Function Hardware Architecture

雙曲線正切函數與雙彎曲函數的圖形類似，不同的地方在於兩者的值域範圍不一樣。雙彎曲函數的值域在0到1之間，雙曲線正切函數則是-1到1之間。片段線性法以內插的方式組合成許多線段交錯而成的圖形。演算法之中的Δ大小及方程式的斜率影響片段線性法合成出的曲線函數圖形。The hyperbolic tangent function is similar to the graph of the double bending function, except that the range of the two ranges is different. The range of the double bending function is between 0 and 1, and the hyperbolic tangent function is between -1 and 1. The segment linear method is interpolated into a pattern in which a plurality of line segments are interlaced. The delta size and the slope of the equation in the algorithm affect the curve function graph synthesized by the segment linear method.

調整前一小節硬體架構中常數暫存器的數值以及移位暫存器實現雙曲線正切函數，得到的硬體架構圖如圖十一。兩次疊代時的Δ值經由軟體測試決定，分別是Δ₁=0.476807、Δ₂=0.188034。Adjust the value of the constant register in the previous section of the hardware architecture and the hyperbolic tangent function of the shift register. The resulting hardware architecture is shown in Figure 11. The delta values for the two iterations are determined by the software test, Δ ₁ =0.476807 and Δ ₂ = 0.188034, respectively.

3.1.9　活化函數對稱性質3.1.9 Symmetric properties of the activation function

利用片段線性法實現的雙曲線正切函數與雙彎曲函數皆有對稱的性質。如圖十二和圖十三。依照方程式(3.7)和方程式(3.8)利用減法器和多工器的判斷可以求得另外一半的函數。這樣的方法可以減少將近一半的邏輯元件使用量。在前兩節的硬體架構如圖十四和圖十五。Both the hyperbolic tangent function and the double bending function realized by the segment linear method have symmetrical properties. See Figure 12 and Figure 13. The other half of the function can be obtained by using the judgment of the subtractor and the multiplexer according to equations (3.7) and (3.8). Such an approach can reduce the usage of nearly half of the logic components. The hardware architecture in the first two sections is shown in Figure 14 and Figure 15.

3.2　管理單元3.2 Management Unit

管理單元控制系統流程，管裡系統重要指標的排序。本發明的系統架構包含五種相異功能的管理單元，主要分別為控制權重記憶體位址的排序、管理運算單元位址、排列目標值的存取順序、決策回饋層的神經元節點輸出與否以及整體系統流程的控制。管理單元以有限狀態機的設計模式，依照目前的狀態執行該管理單元在程式設計上應該運作的行為，並進入下一個指定的狀態。系統管理單元除了控制整體系統的流程，也要處理分段運算的機制以及系統的時序控制。The management unit controls the system flow and the ordering of important indicators in the management system. The system architecture of the present invention comprises five different functional management units, which are mainly for controlling the ordering of the weight memory addresses, managing the operation unit address, arranging the access order of the target values, and outputting the neuron nodes of the decision feedback layer. And control of the overall system flow. The management unit performs the behavior that the management unit should operate on the program in accordance with the current state in the design mode of the finite state machine, and enters the next specified state. In addition to controlling the flow of the overall system, the system management unit also handles the mechanism of the segmentation operation and the timing control of the system.

3.2.1　權重記憶體位址管理單元3.2.1 Weight Memory Address Management Unit

倒傳遞誤差學習法的一次完整的訓練裡，以正向和逆向的角度來觀察，權重值的排列方式可分為三種。第一種為正向運算時權重值排列的順序，第二種為逆向運算時權重值排列的順序，第三種則是在更新權重值時排列的順序。配合環狀串列架構的硬體共構，為了方便系統找到權重的記憶體位址，針對這三種順序分別設計權重記憶體位址管理單元，負責將權重記憶體的位址先行儲存在隨機存取記憶體裡。當系統在執行其中一種狀態時，則由中央管理單元下指令要求權重記憶體位址管理單元按順序輸出權重記憶體的位址，系統會根據這位址在特定的記憶體下找到需要的權重值。In a complete training of the inverse transfer error learning method, the forward and reverse angles are observed, and the arrangement of the weight values can be divided into three types. The first is the order in which the weight values are arranged in the forward operation, the second is the order in which the weight values are arranged in the reverse operation, and the third is the order in which the weight values are updated. In order to facilitate the system to find the memory address of the weight, the weight memory address management unit is designed for each of the three sequences, and the address of the weight memory is first stored in the random access memory. In the body. When the system is executing one of the states, the instruction of the central management unit requests the weight memory address management unit to sequentially output the address of the weight memory, and the system finds the required weight value according to the address in the specific memory. .

權重記憶體位址管理單元的硬體區塊圖如圖十六所示，包含三個隨機存取記憶體個別負責存取一種排列的權重記憶體位址。管理單元接收系統管理單元的指令，當intialize腳位致能時LearningAddr_FIFO和UpdatingAddr_FIFO區塊會開始根據輸入的參數進行上述三種情況的權重記憶體位址排列，以下是權重記憶體位址管理單元重要訊號說明：The hardware block diagram of the weight memory address management unit is shown in FIG. 16. The three random access memories are individually responsible for accessing an array of weight memory addresses. The management unit receives the instruction of the system management unit. When the intialize pin is enabled, the LearningAddr_FIFO and the UpdatingAddr_FIFO block will start to perform the weight memory address arrangement of the above three cases according to the input parameters. The following is an important signal description of the weight memory address management unit:

1.　Initialize：此為權重記憶體位址管理單元的致能腳位，當硬體需要的網路參數都設定完畢後，會開始要求各區塊執行權重記憶體位址的排列和儲存。1. Initialize: This is the enable pin of the weight memory address management unit. When the network parameters required by the hardware are set, the block will be required to perform the arrangement and storage of the weight memory address.

2.　clear：此訊號致能時，儲存第一種順序的記憶體讀寫位址都回到最低位置。2. clear: When this signal is enabled, the memory read and write addresses stored in the first order are returned to the lowest position.

3.　LA_FIFO_clear：此訊號負責儲存第二種順序的記憶體讀寫位址都回到最低位置。3. LA_FIFO_clear: This signal is responsible for storing the second order of memory read and write addresses back to the lowest position.

4.　UA_FIFO_clear：此訊號致能時，儲存第三種順序的記憶體讀寫位址都回到最低位置。4. UA_FIFO_clear: When this signal is enabled, the third-order memory read/write address is returned to the lowest position.

5.　Done：當權重管理單元的權重記憶體位址排列完畢後，此訊號會產生高電位，表示此區塊的工作完畢。5. Done: When the weight memory address of the weight management unit is arranged, this signal will generate a high potential, indicating that the work of this block is completed.

6.　kind：記錄目前網路種類的訊號，當訊號為”000”時為三層倒傳遞類神經網路，當訊號為”001”時表示是四層倒傳遞類神經網路，”010”時為三層回饋型倒傳遞類神經網路，”011”時是Elman類神經網路，訊號”110”代表隱藏層增加回饋層的三層回饋型倒傳遞類神經網路，訊號”111”則代表隱藏層增加回饋層的Elman類神經網路。權重管理單元根據此訊號判斷有無需要作回饋層權重排列的必要。6. kind: record the current network type signal, when the signal is "000", it is a three-layer reverse transmission type neural network. When the signal is "001", it means a four-layer inverted transmission type neural network, "010" The three-layer feedback type back-transfer-like neural network, "011" is an Elman-like neural network, and the signal "110" represents a three-layer feedback-type inverted-transitive neural network with a hidden layer added to the feedback layer, signal "111" It represents the Elman-like neural network that adds the feedback layer to the hidden layer. Based on this signal, the weight management unit determines whether it is necessary to make a feedback layer weighting arrangement.

7.　Rd_opcode：此訊號控制權重管理單元輸出的權重排列是哪一種。當此訊號為”00”時權重值從記憶體輸出的順序為第一種，為”01”時則為第二種逆向權重排列順序，”10”時則是網路進行更新權重時的第三種順序。7. Rd_opcode: This signal controls which weights are output by the weight management unit. When the signal is "00", the order of the weight value output from the memory is the first type, when it is "01", it is the second reverse weight order, and the "10" is the time when the network updates the weight. Three orders.

權重記憶體位址的第一種順序利用佇列記憶體依序儲存。當系統開始執行時系統管理單元會先下指令要求各區塊作初始化的動作，接著權重控制區塊會開始運作，將第一、二、三種情形的位址排序寫入到佇列記憶體裡。The first order of the weight memory addresses is stored sequentially using the queue memory. When the system starts executing, the system management unit will first execute the instruction to request the initialization of each block. Then the weight control block will start to operate, and the address of the first, second and third cases will be sorted into the queue memory. .

第二種權重記憶體位址的排序是針對在倒傳遞學習演算法進行逆向運算中的情形。逆向運算計算隱藏層誤差的演算式跟正向運算一樣有權重值乘法累加的動作。因此可利用硬體共構的方式，利用環狀串列的架構進行這部分式子的演算。逆向運算是從輸出層往隱藏層的方向計算。所以權重位址的排列就是從輸出層排列到隱藏層，硬體架構的設計配合演算法主要考慮的是偏權值不用排列，隱藏層到輸入層的權重不需要排列。環狀串列的運算單元在讀取記憶體時可能讀取到下一個神經元運算所需要的權重值。預防此種情況發生要在同一層的記憶體位置上補上相同個數的權重值。圖十八為權重記憶體位址管理單元的逆向運算排列狀態圖，表三是圖十八的說明。The ordering of the second weight memory address is for the case of reverse operation in the inverse transfer learning algorithm. The calculation of the hidden layer error by the inverse operation has the same weight as the forward operation. Therefore, the calculation of this part of the equation can be performed by using the structure of the ring string by means of hardware co-construction. The inverse operation is calculated from the output layer to the direction of the hidden layer. Therefore, the arrangement of the weighted addresses is arranged from the output layer to the hidden layer. The design of the hardware architecture is mainly considered that the partial weight values are not arranged, and the weights of the hidden layer to the input layer need not be arranged. The operation unit of the ring string may read the weight value required for the next neuron operation when reading the memory. To prevent this from happening, add the same number of weight values to the memory location on the same layer. FIG. 18 is a reverse operation arrangement state diagram of the weight memory address management unit, and Table 3 is a description of FIG.

第三種權重位址排列是用於類神經網路在更新權重值的時候。這部分的排列用於逆向運算時算出權重修正量，方便系統在記憶體找出需要更新的權重。因為倒傳遞網路演算法的權重是從網路架構最後面的輸出層往隱藏層方向逐層修正。因此權重位址的排序是以倒序的方式儲存在記憶體裡。The third weighting address arrangement is used when the neural network is updating the weight value. This part of the arrangement is used to calculate the weight correction amount in the reverse operation, which is convenient for the system to find the weight that needs to be updated in the memory. Because the weight of the inverse transfer network algorithm is modified from the output layer at the end of the network architecture to the hidden layer layer by layer. Therefore, the ordering of the weighted addresses is stored in the memory in reverse order.

以改良型Elman類神經網路為例，輸入層神經元個數1個、輸入層的回饋神經元個數3個、隱藏層神經元個數3個、隱藏層的回饋神經元個數2個，輸出層神經元個數2個，硬體預設的最大處理單元兩個，如圖十七所示。其權重記憶體位址排列圖是圖十九。在環狀串列架構開始運作之前，權重值會先儲存在運算單元的記憶體裡，如圖二十所示。當輸入值透過輸入資料匯流排傳給環狀串列架構時，運算單元陣列會同時逐次讀出運算單元1和2儲存的權重值跟輸入值作相乘累加的動作。例如計算隱藏層的輸出時運算單元1會依序讀出W04、W14、W24、W34、W44，運算單元2同時輸出W05、W15、W25、W35、W55並與輸入值在乘法累加器中運算。運算的結果會儲存在隱藏層記憶體裡。接著系統會判斷同層的神經元都運算完畢否，如果沒有則會做分段的處理。繼續讀出運算單元1和運算單元2儲存的權重值W06、W16、W26、W36、W66和X、X、X、X、X。X是權重管理單元用來填補權重個數的預設值。當預設的運算單元數量非同層的網路神經元總數的因數時，會發生運算單元讀取權重的錯誤，而提早讀到下一個神經元需要的權重值。因此在運算單元2裡，權重記憶體裡需要安排與運算單元1同樣的權重數補齊。Take the modified Elman neural network as an example, the number of input layer neurons is 1, the number of feedback neurons in the input layer is 3, the number of hidden layer neurons is 3, and the number of feedback neurons in the hidden layer is 2 The number of neurons in the output layer is two, and the maximum processing unit of the hardware is preset, as shown in Figure 17. The weight memory address map is shown in Figure 19. Before the ring serial architecture begins to operate, the weight value is first stored in the memory of the arithmetic unit, as shown in Figure 20. When the input value is transmitted to the ring string architecture through the input data bus, the operation unit array sequentially reads out the weighted values stored in the operation units 1 and 2 and multiply and accumulate the input values. For example, when calculating the output of the hidden layer, the arithmetic unit 1 sequentially reads W04, W14, W24, W34, and W44, and the arithmetic unit 2 simultaneously outputs W05, W15, W25, W35, and W55 and operates with the input value in the multiply accumulator. The result of the operation is stored in the hidden layer memory. Then the system will judge whether the neurons in the same layer have been calculated. If not, the segmentation will be processed. The weight values W06, W16, W26, W36, W66 and X, X, X, X, and X stored in the arithmetic unit 1 and the arithmetic unit 2 are continuously read. X is a preset value used by the weight management unit to fill the number of weights. When the number of preset arithmetic units is not a factor of the total number of network neurons in the same layer, an error occurs in which the arithmetic unit reads the weight, and the weight value required for the next neuron is read early. Therefore, in the arithmetic unit 2, it is necessary to arrange the same weight number as the arithmetic unit 1 in the weight memory.

3.2.2　位址管理單元3.2.2 Address Management Unit

位址管理單元是負責管理運算單元地址的控制器。根據網路的種類以及硬體預設的最大運算單元數目控制運算單元位址的編號。環狀串列架構是運算單元平行串列而成的，每個運算單元具有一個獨立的地址。當權重要寫入到某個運算單元裡的記憶體時，會根據地址匯流排上的地址寫入正確的運算單元裡。The address management unit is a controller that is responsible for managing the address of the arithmetic unit. The number of the arithmetic unit address is controlled according to the type of the network and the maximum number of arithmetic units preset by the hardware. The ring serial architecture is a parallel arrangement of arithmetic units, each of which has a separate address. When the weight is importantly written to the memory in an operation unit, it is written into the correct operation unit according to the address on the address bus.

由於不同的網路種類、網路神經元個數以及硬體預設的最大處理單元。在使用環狀串列架構進行運算前，權重值必須儲存在指定的運算單元裡。以圖二十一為例，若要把權重W04、W14、W24、W34、W44寫入到第一個運算單元1的記憶體中時。權重W04、W14、W24、W34、W44寫入到權重匯流排上時，地址匯流排上的訊號必須為1，便會將權重W04、W14、W24、W34、W44寫入運算單元1裡。接著要將權重W05、W15、W25、W35、W55透過權重匯流排寫入到運算單元2裡，需要將地址匯流排調整為2，如圖二十二所示。Due to different network types, number of network neurons, and maximum processing unit for hardware presets. The weight value must be stored in the specified unit before the operation is performed using the ring string architecture. Taking FIG. 21 as an example, if weights W04, W14, W24, W34, and W44 are to be written into the memory of the first arithmetic unit 1. When the weights W04, W14, W24, W34, and W44 are written to the weight bus, the signal on the address bus must be 1, and the weights W04, W14, W24, W34, and W44 are written into the arithmetic unit 1. Then, the weights W05, W15, W25, W35, and W55 are written into the arithmetic unit 2 through the weight bus, and the address bus needs to be adjusted to 2, as shown in FIG.

3.2.3　目標值管理單元3.2.3 Target value management unit

本發明採用的倒傳遞學習演算法屬於監督式學習，學習的目的在於降低網路輸出值跟目標值的差距。為了降低差距，需要透過逆向運算調整神經元之間的權重值。由於逆向運算是先從輸出層最後一個神經元開始計算輸出層誤差，以佇列記憶體儲存的目標值需考慮儲存的順序。否則在計算輸出層最後一個神經元的誤差時，從目標值記憶體輸出的第一個目標值將不會是本次計算需要的目標值。當目標值的數目與輸出層神經元的數目相同時，上述的問題可以改用堆疊記憶體以先進後出的排序方式解決最後一個目標值卻是計算第一個輸出層誤差需要的目標值的問題。但是當目標值的數目與輸出層神經元的數目不同時，也就是訓練樣本不只一組時，則使用堆疊記憶體的方法將不適用。存在最後一組的目標值資料將會變成第一組用來計算輸出層誤差的目標值。考慮上述的兩個排序問題，本發明設計一個管理目標值儲存的記憶體位址的控制器，可使目標值在初始化儲存進記憶體時，只需按照第一組第一個神經元，第一組第二個神經元的順序輸入。而控制器會根據輸出層的節點數和目標值的總數目進行適合逆向運算的目標值記憶體位址儲存排序。依照排序好的目標值記憶體位址讀出在初始化時儲存的目標值轉給目標值堆疊記憶體儲存。完成儲存的目標值堆疊記憶體其內部存的目標值可以正確的配合逆向運算的輸出層誤差計算。The inverse transfer learning algorithm adopted by the present invention belongs to supervised learning, and the purpose of learning is to reduce the gap between the network output value and the target value. In order to reduce the gap, it is necessary to adjust the weight value between the neurons through the reverse operation. Since the reverse operation is to calculate the output layer error from the last neuron in the output layer, the order of storage should be considered in order to store the target value of the memory. Otherwise, when calculating the error of the last neuron in the output layer, the first target value output from the target value memory will not be the target value required for this calculation. When the number of target values is the same as the number of neurons in the output layer, the above problem can be solved by using the stacked memory to solve the last target value in an advanced sorting manner, but it is the target value required to calculate the first output layer error. problem. However, when the number of target values is different from the number of neurons in the output layer, that is, when there is more than one set of training samples, the method of using stacked memory will not be applicable. The last set of target value data will become the first set of target values used to calculate the output layer error. Considering the above two sorting problems, the present invention designs a controller for managing the memory address of the target value storage, so that the target value can be stored in the memory only by following the first group of the first neuron, first The sequential input of the second neuron of the group. The controller performs the storage of the target value memory address suitable for the reverse operation according to the number of nodes of the output layer and the total number of target values. The target value stored at the initialization is read to the target value stack memory storage according to the sorted target value memory address. Completing the stored target value The internal memory target value of the stacked memory can be correctly calculated with the output layer error of the reverse operation.

若網路輸出層具有兩個神經元，問題有四組目標值資料，初始化時目標值佇列記憶體儲存的目標值和記憶體位址如圖二十三所示。倒傳遞學習演算法的逆向運算會先從記憶體位址1的t1開始計算輸出層誤差。但佇列記憶體會先從記憶體位址0讀出t0，接著才是讀出t1。經過目標值管理單元的處理，目標值記憶體位址會重新排列寫入到堆疊記憶體裡如圖二十四所示。圖二十四的堆疊記憶體讀取時會先輸出第一組最後一個神經元t1的目標值再輸出第一個神經元t0的目標值給逆向硬體區塊。逆向硬體區塊計算第二組輸出層誤差時，堆疊記憶體接著讀出第二組最後一個神經元t3的目標值再讀取第一個神經元的目標值t2。表四是目標值演算法，演算法一開始先定義最大的組別編號，接著寫入記憶體位址的組別會從編號最大的開始，計算組別中記憶體最小的位址，位址會逐次加一寫入到目標值堆疊記憶體裡。寫完一組的記憶體位址，同理繼續編排下一組的記憶體位址並寫入到目標堆疊記憶體裡直到所有的目標值記憶體位置寫入完畢。If the network output layer has two neurons, the problem has four sets of target value data. The target value and the memory address stored in the target value queue memory are shown in Fig. 23. The inverse operation of the inverse transfer learning algorithm first calculates the output layer error starting from t1 of the memory address 1. However, the queue memory will first read t0 from the memory address 0, and then read t1. After processing by the target value management unit, the target value memory address is rearranged and written into the stacked memory as shown in FIG. When the stacked memory of FIG. 24 is read, the target value of the last neuron t1 of the first group is output first, and the target value of the first neuron t0 is output to the inverse hardware block. When the inverse hardware block calculates the second set of output layer errors, the stacked memory then reads the target value of the last neuron t3 of the second group and then reads the target value t2 of the first neuron. Table 4 is the target value algorithm. The algorithm first defines the largest group number. Then the group written to the memory address will start from the largest number, and calculate the smallest address of the memory in the group. The address will be Write one by one to the target value stack memory. Write a set of memory addresses, and continue to program the next set of memory addresses and write them to the target stack memory until all target value memory locations have been written.

3.2.4　回饋節點控制單元3.2.4 Feedback Node Control Unit

回饋型類神經網路的網路架構跟前饋式類神經網路最大的差別在於，回饋型類神經網路有回饋處理層。回饋處理層的輸入訊息來自隱藏層或是輸出層神經元的回饋(feedback)，提供網路局部記憶的功能。回饋的動作造成的時間延遲(Time-Delay)的效果，表現出回饋類神經網路適合處理跟時間有關係的動態問題。本發明設計的回饋型類神經網路除了提供自隱藏層回饋的回饋型類神經網路外，並增加了自輸出層回饋到隱藏層的回饋型類神經網路。回饋層的神經元節點由回饋節點控制單元控制回饋層的神經元輸出是否傳給下一層的神經元。當回饋層的神經元輸出不傳給下一層的神經元時則代表跟回饋層的神經元相連結的神經元不作回饋的動作。回饋控制單元從硬體的特定腳位接收需要回饋的神經元訊號，每一個訊號經由回饋控制單元裡的移位累加器，組合成一組控制回饋型類神經網路的隱藏層或輸出層神經元回饋的向量訊號。The biggest difference between the network architecture of the feedback type neural network and the feedforward neural network is that the feedback type neural network has a feedback processing layer. The input signal of the feedback processing layer comes from the feedback of the hidden layer or the output layer neurons, and provides the function of local memory of the network. The effect of the time delay (Time-Delay) caused by the feedback action shows that the feedback-like neural network is suitable for dealing with dynamic problems related to time. The feedback-type neural network designed by the invention not only provides a self-hiding layer feedback feedback type neural network, but also adds a feedback type neural network from the output layer to the hidden layer. The neuron node of the feedback layer is controlled by the feedback node control unit to control whether the neuron output of the feedback layer is transmitted to the neurons of the next layer. When the neuron output of the feedback layer is not transmitted to the neurons of the next layer, it means that the neurons connected to the neurons of the feedback layer do not perform feedback. The feedback control unit receives the neuron signals that need to be fed back from the specific pin of the hardware, and each signal is combined into a set of hidden layer or output layer neurons of the feedback feedback type neural network via the shift accumulator in the feedback control unit. The vector signal that is fed back.

3.2.5　系統管理單元3.2.5 System Management Unit

系統管理單元控制整個類神經網路的流程。每個區塊之間的資料都要配合時序控制才能正確的運算出演算法的結果。控制器以有限狀態機的設計，通過系統目前狀態的解碼，命令與下一個狀態有關的電路區塊開始動作，如圖二十五所示。系統管理單元除了控制每個區塊之間的時序，也要控制整個環狀串列的架構、分段計算、累加器、多工器訊號選擇線以及控制其他的管理單元等。The system management unit controls the flow of the entire neural network. The data between each block must be matched with the timing control to correctly calculate the result of the algorithm. The controller uses the design of the finite state machine to decode the current state of the system, and the command begins with the circuit block associated with the next state, as shown in Figure 25. In addition to controlling the timing between each block, the system management unit also controls the architecture of the entire ring string, segmentation calculation, accumulator, multiplexer signal selection line, and control of other management units.

控制器的主要流程及狀態如圖二十六所示。流程圖中分成三大區塊：初始化、前向運算跟逆向運算。系統開始運行時，先進入初始化的流程，將硬體裡的暫存器及記憶體參數歸零，避免錯誤的資料造成運算錯誤。接著系統接收網路參數、訓練資料和目標資料並寫入到暫存器以及記憶體裡。當資料儲存完畢，管理單元產生起始訊號要求權重、位址、回饋節點和目標值控制器依照網路參數進行計算。等待系統接收到權重、位址、回饋節點和目標值控制器的完成訊號後，便根據各管理單元完成的結果，將前向運算和逆向運算需要用到的資料寫入到系統指定的記憶體裡。完成初始化的工作。The main flow and status of the controller are shown in Figure 26. The flow chart is divided into three major blocks: initialization, forward operation and reverse operation. When the system starts running, first enter the initialization process, zeroing the scratchpad and memory parameters in the hardware, to avoid the operation error caused by the wrong data. The system then receives network parameters, training data, and target data and writes them to the scratchpad and memory. When the data is stored, the management unit generates the start signal request weight, the address, the feedback node, and the target value controller to calculate according to the network parameters. After waiting for the completion signal of the weight, address, feedback node and target value controller, the system writes the data needed for the forward operation and the reverse operation to the memory specified by the system according to the result of each management unit. in. Complete the initialization work.

當系統完成初始化的動作後便開始進行前向運算的部分。系統控制器會根據第一隱藏層活化函數的類型，調整多工器的訊號選擇線。然後從輸入佇列記憶體中將訓練資料傳送至環狀串列架構進行加權運算。若為回饋式神經網路則在訓練資料寫入後，系統控制器會根據回饋節點控制器的輸出訊號，決定回饋層的資料是否傳送到環狀架構裡。運算完第一隱藏層神經元的數值並儲存後，控制器根據網路大小判別硬體是否需要進行分段計算直到完成第一層隱藏層的運算。根據網路的類型，調整多工器的訊號選擇線。隱藏層及輸出層的運算，都是將前一層利用環狀串列中經過活化函數區塊計算出來後儲存在記憶體中的值，再傳回環狀串列架構中，完成整個前向運算。When the system completes the initialization action, it begins the part of the forward operation. The system controller adjusts the signal selection line of the multiplexer according to the type of the first hidden layer activation function. The training data is then transferred from the input queue memory to the ring string architecture for weighting operations. In the case of a feedback neural network, after the training data is written, the system controller determines whether the data of the feedback layer is transmitted to the ring structure according to the output signal of the feedback node controller. After the value of the first hidden layer neuron is calculated and stored, the controller determines whether the hardware needs to perform segmentation calculation according to the network size until the operation of the first layer hidden layer is completed. Adjust the signal selection line of the multiplexer according to the type of network. The operation of the hidden layer and the output layer is performed by using the value of the previous layer calculated by the activation function block in the ring string and storing it in the memory, and then returning it to the ring string architecture to complete the entire forward operation. .

在學習模式中，系統進行完前向運算後，逆向運算開始。控制器指示相關的記憶體輸出目標值，微分活化函數和前向運算的結果傳給逆向運算區塊計算輸出層的誤差。控制器將輸出層的誤差與權重管理單元排列的權重透過環狀串列架構計算，再輸入隱藏層誤差區塊得到隱藏層誤差。隱藏層誤差完成計算後，控制器將誤差、學習速率以及各神經元前一層級的輸入值，透過權重管理單元配合Δw區塊運算所需的時脈，以管線的設計方式同步在逆向運算區塊計算權重修正量並更新權重和儲存新的權重。最後在判斷訓練資料是否都訓練完畢，且達到設定的訓練次數，完成整個演算法的運算。In the learning mode, after the system performs the forward operation, the reverse operation starts. The controller instructs the associated memory output target value, and the differential activation function and the result of the forward operation are passed to the inverse operation block to calculate the error of the output layer. The controller calculates the error of the output layer and the weight of the weight management unit through the ring serial architecture, and then inputs the hidden layer error block to obtain the hidden layer error. After the hidden layer error is calculated, the controller compares the error, the learning rate, and the input value of the previous level of each neuron through the weight management unit with the clock required for the Δw block operation, and synchronizes the design in the pipeline in the reverse operation area. The block calculates the weight correction amount and updates the weight and stores the new weight. Finally, it is judged whether the training materials are all trained, and the set number of trainings is reached, and the operation of the entire algorithm is completed.

3.3　軟體規劃3.3 Software Planning

本系統經由Nios II微處理器傳送訓練資料和命令至Avalon匯流排，再透過Avalon匯流排暫存器的設計，提供硬體網路參數和運作的指令，如：網路種類、活化函數的選擇、訓練資料、網路大小、權重資料、目標值、學習速率…等。此設計的優點是當要修改網路的類型和大小時，可以透過軟體設定，不需要對硬體重新設定與編譯。圖二十七為本發明所使用的系統及Nios II的架構圖。由於硬體運算採用定點小數，而Nios II軟體使用的小數型態為以IEEE754格式的浮點小數。為了降低NIOS II處理器的負擔及增快整體系統的運算速度，因此將浮點轉換定點以及定點轉換浮點的轉換器置放於硬體。當軟體傳送訓練資料給硬體時，會先經過浮點轉換定點的轉換器轉換成定點小數格式給硬體運算。硬體運算完畢後，新的權重經過定點轉換浮點的轉換器轉換成浮點小數給記憶體儲存等待軟體的讀取。The system transmits training data and commands to the Avalon bus via the Nios II microprocessor, and then provides design of hardware network parameters and operations through the design of the Avalon bus register, such as network type and activation function selection. , training materials, network size, weight data, target value, learning rate, etc. The advantage of this design is that when you want to modify the type and size of the network, you can set it through the software without re-setting and compiling the hardware. Figure 27 is a block diagram of the system used in the present invention and the Nios II. Since the hardware operation uses fixed-point decimals, the decimal type used by the Nios II software is a floating-point fraction in IEEE754 format. In order to reduce the load on the NIOS II processor and increase the speed of the overall system, the converters for floating-point conversion and fixed-point conversion floating-point are placed on the hardware. When the software transmits the training data to the hardware, it will first convert to a fixed-point decimal format for the hardware operation through the converter of the floating-point conversion fixed point. After the hardware operation is completed, the new weight is converted to a floating point decimal by the converter of the fixed point conversion floating point to the memory storage waiting software to read.

圖二十八為軟體規劃流程圖。系統正常運作除了硬體設計外，必須靠軟體下達參數。系統剛啟動時，由軟體下達硬體重置的指令，接著定義網路種類、網路測試或訓練、權重資料、目標資料、輸入資料、學習速率、訓練迴圈次數以及各層活化函數類型、網路架構，當網路設定以及資料都傳給硬體完畢後，下達開始訓練的命令，硬體會開始運算直到經由Avalon介面回覆軟體運算完成的訊號。接收到完成的訊號，軟體可執行讀取結果的指令。表五則是負責溝通受控端硬體與主控端軟體的Avalon匯流排上的暫存器功能。Figure 28 shows the software flow chart. In addition to the hardware design, the system must be operated by software. When the system is first started, the hardware resets the instructions, then defines the network type, network test or training, weight data, target data, input data, learning rate, training loop times, and activation function types and networks. The architecture, when the network settings and data are passed to the hardware, the command to start training is released, and the hardware will start computing until the signal is completed by the Avalon interface. Upon receiving the completed signal, the software can execute the instruction to read the result. Table 5 is the register function on the Avalon bus that is responsible for communicating the controlled end hardware and the host software.

本發明的硬體設計透過ModelSim驗證各控制器的功能，和硬體類神經網路是否正確的運行倒傳遞網路演算法。為模擬硬體可透過參數的變換合成不同的類神經網路，分別設計了兩種不同的測試碼，合成1*3*4的倒傳遞網路，和1*4*1的回饋型類神經網路，兩種網路的圖型如圖二十九、三十。圖中神經元裡的號碼代表神經元的編號，wi j代表第i個神經元和第j個神經元之間的權重。bk代表第k個神經元的偏權值。The hardware design of the present invention verifies the function of each controller through ModelSim, and whether the hardware-like neural network runs the reverse-transfer network algorithm correctly. In order to simulate different hardware-like networks of hardware-transformable parameters, two different test codes were designed, and a 1*3*4 inverted transmission network was synthesized, and a 1*4*1 feedback-like neural network was synthesized. The network, the two networks are shown in Figure 29 and 30. The number in the neuron in the figure represents the number of the neuron, and wi j represents the weight between the i-th neuron and the j-th neuron. Bk represents the bias value of the kth neuron.

4.2.1　三層倒傳遞類神經網路的模擬4.2.1 Simulation of three-layer inverted transfer neural network

硬體在設計時，每一個區塊都必須能在正確的時脈接收和輸出正確的數值，本小節配合圖3.25的流程，利用ModelSim模擬的波形圖說明每個重要的流程中，記憶體和訊號線變化的情形。表六是網路參數的初始值，在初始化時網路參數傳給權重、地址、和目標值管理單元執行各單元負責的工作，完成初始化的動作。接著在前向運算的流程裡，權重值依照運算單元的地址寫入到運算單元的記憶體裡，然後讀取輸入佇列記憶體儲存的訓練資料。在運算單元進行乘法累加的動作後，再經過活化函數區塊的計算輸出活化函數值和微分值。22個權重寫入的運算單元地址如圖三十一、三十二、三十三所示。活化函數區塊會同時輸出活化函數值和活化函數微分值。圖三十四中的logsig、tansig和linear腳位分別是雙彎曲函數、雙曲線正切函數和集成函數的輸出訊號線。而d_logsig、d_tansig和d_linear則是函數的微分值。在訓練模式下，前向運算完畢之後接著進行逆向運算。逆向運算會先計算輸出層的誤差。圖三十五系統管理單元讀取目標值堆疊記憶體、輸出層神經元的輸出和活化函數微分值給逆向運算區塊中的輸出層誤差區塊計算誤差。輸出層誤差會接著傳給運算單元陣列，得到的結果會與隱藏層輸出微分值計算隱藏層誤差。完成輸出層與隱藏層的誤差計算後，誤差運算完畢會繼續在逆向運算區塊裡的修正權重區塊和更新權重區塊計算修正權重和修正後的新權重。讀取新的權重存放的記憶體，可以在腳位new_weight_stack_output觀察到值如圖三十六、三十七、三十八、三十九所示。When designing the hardware, each block must be able to receive and output the correct value at the correct clock. This section, along with the flow of Figure 3.25, uses the waveforms of the ModelSim simulation to illustrate each important process, memory and The situation where the signal line changes. Table 6 is the initial value of the network parameters. During initialization, the network parameters are passed to the weight, address, and target value management units to perform the operations responsible for each unit, and the initialization action is completed. Then, in the forward operation flow, the weight value is written into the memory of the operation unit according to the address of the operation unit, and then the training data stored in the input memory is read. After the operation unit performs the multiplication and accumulation operation, the activation function value and the differential value are output through the calculation of the activation function block. The address of the 22-weighted arithmetic unit is shown in Figure 31, 32, and 33. The activation function block outputs both the activation function value and the activation function differential value. The lettersig, tansig, and linear pins in Figure 34 are the output signal lines for the double-bend function, the hyperbolic tangent function, and the integration function, respectively. And d_logsig, d_tansig, and d_linear are the differential values of the function. In the training mode, the forward operation is followed by the reverse operation. The inverse operation first calculates the error of the output layer. Figure 35 The system management unit reads the target value stack memory, the output layer neuron output, and the activation function differential value to the output layer error block calculation error in the reverse operation block. The output layer error is then passed to the array of arithmetic units, and the resulting results are compared to the hidden layer output differential values to calculate the hidden layer error. After the error calculation of the output layer and the hidden layer is completed, the error calculation process will continue to calculate the correction weight and the revised new weight in the correction weight block and the update weight block in the reverse operation block. Read the memory stored in the new weight, and observe the values in the pin new_weight_stack_output as shown in Figure 36, 37, 38, and 39.

4.2.2　Elman類神經網路的模擬4.2.2 Simulation of Elman Neural Network

本發明模擬的Elman類神經網路有別於一般的Elman類神經網路，除了有應用本發明的自由回饋節點功能，另外增加輸出層的回饋層。實驗用的參數如表七所示。在初始化部分，回饋型類神經網路的波形圖中回饋節點控制單元輸出的訊號會控制回饋層節點的多工器是輸出回饋層節點的資料還是0，本發明控制隱藏層第一和第三個神經元不回饋，回饋節點管理單元輸出給系統管理單元判斷用的向量訊號分別是由judge1_signal和judge2_signal暫存器儲存的0000001010和0000000001，如圖四十。運算單元陣列在逐次輸入回饋層的資料時，系統管理單元會同步一個位元一個時派的讀取判斷神經元是否回饋的訊號，如果是1，則回饋的資料會傳給運算單元陣列計算；如果是0則否。這種輸入方式會影響到運算單元的乘法累加器的輸出訊號，在連續的輸入下，運算單元集成函數暫存器的值每一個時脈會進行一次的變化，但是如果輸入是0的話則維持上一個時脈的值，而本發明隱藏層只回饋第二和第四神經元的輸出值，集成函數暫存器的值在四個時脈裡會各維持兩個時脈一樣的值，如圖四十一所示。計算完隱藏層前兩個神經元的活化函數，因為硬體預設的最大運算單元是兩個。所以要以分段計算的方式再進行後兩個神經元的計算，重新輸入訓練樣本和回饋層的資料得到的隱藏層最後兩個神經元的活化函數如圖四十二、四十三所示。完成前向運算，接著進入逆向運算的流程，計算輸出層和隱藏層的誤差。回饋型類神經網路包括Elman類神經網路，計算誤差的流程與前饋式類神經網路一樣。在計算修正權重時Elman與回饋型類神經網路不同的是與回饋層連接的權重是固定值，不會因訓練而改變。所以系統管理單元在計算修正權重前會先判別是不是Elman類神經網路。The simulated Elman-like neural network of the present invention is different from the general Elman-like neural network except that the free feedback node function of the present invention is applied, and the feedback layer of the output layer is additionally increased. The experimental parameters are shown in Table 7. In the initialization part, the signal output by the feedback node control unit in the waveform diagram of the feedback type neural network controls whether the multiplexer of the feedback layer node is the data of the output feedback layer node or 0, and the present invention controls the hidden layer first and third. The neurons are not fed back, and the vector signals output by the feedback node management unit to the system management unit are stored by the judge1_signal and the judge2_signal register respectively, 0000001010 and 0000000001, as shown in FIG. When the arithmetic unit array sequentially inputs the data of the feedback layer, the system management unit synchronizes a bit to read the signal for judging whether the neuron is fed back. If it is 1, the feedback data is transmitted to the arithmetic unit array for calculation; If it is 0 then no. This input mode affects the output signal of the multiply accumulator of the arithmetic unit. Under continuous input, the value of the arithmetic unit integrated function register will change once every clock, but if the input is 0, it will be maintained. The value of the previous clock, while the hidden layer of the present invention only feeds back the output values of the second and fourth neurons, and the value of the integrated function register maintains the same value of two clocks in each of the four clocks, such as Figure 41 shows. Calculate the activation function of the first two neurons in the hidden layer, because the maximum unit of operation preset by the hardware is two. Therefore, the calculation of the latter two neurons is performed in a piecewise calculation manner, and the activation function of the last two neurons in the hidden layer obtained by re-entering the data of the training sample and the feedback layer is shown in Fig. 42 and Fig. 43. . The forward operation is completed, and then the flow of the reverse operation is entered to calculate the error of the output layer and the hidden layer. The feedback-type neural network includes the Elman-like neural network, and the process of calculating the error is the same as that of the feedforward neural network. In calculating the correction weight, Elman differs from the feedback type neural network in that the weight of the connection with the feedback layer is a fixed value and does not change due to training. Therefore, the system management unit will first determine whether it is an Elman-like neural network before calculating the correction weight.

4.3　實驗結果4.3 Experimental results

4.3.1　曲線擬合4.3.1 Curve fitting

本發明以不同數目的硬體運算單元實現三層倒傳遞網路架構對正弦函數進行曲線擬合(Curve Fitting)，擬合的方程式為(4.2)式，本發明藉此實驗分析硬體的執行效能。訓練樣本從0到1之間隨機取樣50筆，如圖四十四的紅色的圓圈對應的x所示。其輸入的訓練資料為自變數x，y為應變數也是實驗的目標值，實驗參數設定如表八。本發明採用1*4*1的網路架構，訓練網路映射正弦函數輸入和輸出的關係。實驗的初始權重為隨機亂數取[-0.5,0.5]間，訓練資料x的範圍介於0到3.14之間。實驗結果如圖四十四、四十五所示。圖中紅色的圓圈代表訓練完畢之後，輸入訓練資料到類神經網路得到的輸出值，綠色符號代表測試資料在類神經網路回憶輸出的值，藍色線是目標值。實驗中在相同的初始權重下比較硬體合成的運算單元個數不同對運算速度造成的影響，實驗結果分析如表九。硬體運算單元比較少的硬體類神經網路需要分段更多次計算神經元的輸出值。在運算上會花費更多的時間。實驗中擁有3個硬體運算單元的硬體網路在訓練時間上比擁有硬體運算單元2個的硬體網路還費時。原因是分段運算技術的權重排列問題，使得擁有3個運算單元的硬體網路實際上合成出一個1*6*1的硬體網路計算1*4*1的類神經網路。控制器針對多餘的運算單元需要做更多的處理，以致於相同的分段處理次數下，擁有3個硬體運算單元的硬體網路會比擁有2個硬體運算單元的硬體網路訓練時間久。相同的初始權重和訓練次數，硬體網路訓練完將得到同樣的更新權重，不會因為硬體運算單元數目的不同而有相異的結果。所以實驗結果得到的誤差都一樣。The invention implements a curve fitting (Curve Fitting) on a sinusoidal function by using a three-layer inverted transmission network architecture with different numbers of hardware computing units, and the fitting equation is (4.2), and the present invention analyzes the execution of the hardware by this experiment. efficacy. Training samples from between 0 and 1 randomly sampled pen 50, as shown in FIG red circle that 44 corresponding to x. The training data input is self-variable x , y is the number of strains and the target value of the experiment. The experimental parameters are set as shown in Table 8. The invention adopts a 1*4*1 network architecture to train the relationship between the input and output of the sine function of the network mapping. The initial weight of the experiment is between random numbers [-0.5, 0.5], and the training data x ranges from 0 to 3.14. The experimental results are shown in Figures 44 and 45. The red circle in the figure represents the output value obtained by inputting the training data to the neural network after training, the green symbol represents the value of the test data in the neural network, and the blue line is the target value. In the experiment, the influence of the number of arithmetic units of hardware synthesis on the operation speed is compared under the same initial weight. The experimental results are shown in Table 9. Hardware-based neural networks with fewer hardware units need to segment the output values of neurons more times. It takes more time to calculate. The hardware network with three hardware units in the experiment is more time-consuming in training time than the hardware network with two hardware units. The reason is the weighting problem of the segmentation algorithm, so that the hardware network with three arithmetic units actually synthesizes a 1*6*1 hardware network to calculate the 1*4*1 class neural network. The controller needs to do more processing for the extra arithmetic unit, so that the hardware network with three hardware units will be the same as the hardware network with two hardware units under the same number of processing steps. Training time is long. With the same initial weight and number of trainings, the hardware network will receive the same update weight after training, and will not have different results due to the number of hardware units. Therefore, the error obtained by the experimental results is the same.

y(x)=sin(x)　(4.2) y ( x )=sin( x ) (4.2)

4.3.2 Voltage prediction when the battery is discharged

為了驗證本發明提出的系統可以實現回饋型類神經網路，本發明以加百欲公司提供的電池放電資料進行電壓的預測實驗。此電池資料每10秒記錄一組電池的放電數據。電池功率維持在-29.1瓦特附近。本文採用電池資料裡的電壓、電流及電池溫度為類神經網路的訓練資料，並在輸入網路之前依照方程式(4.3)～(4.5)進行正規化的轉換讓輸入的數值在[0.2,0.8]間，如圖四十七至四十九所示，左邊的圖是原始資料的數據圖，右邊的圖是經過正規化後的數據圖。表十描述實驗的網路參數。初始權重是在[-0.5,0.5]間隨機選取的亂數。本發明採用線上學習法每輸入一組數據，就開始訓練網路直到設定的條件為止。本發明設定的訓練停止條件是訓練1000次，訓練完畢網路會輸入第二組數據預測第三組的電壓值。接著硬體網路以第二組數據訓練網路再輸入第三組數據預測第四組的電壓值，實驗直到訓練樣本都訓練完畢才結束。圖五十是硬體網路預測的正規化放電曲線圖。圖中的預測電壓值跟實際的電壓圖比較，可以觀察出預測的電壓值約略大於實際的電壓值，圖五十一是兩條放電曲線的誤差圖，實驗對電壓的預測越到後面誤差越小。表十一是實驗的結果數據，每一組訓練資料的訓練時間平均花費0.0047秒接著再預測10秒後的電壓值。In order to verify that the system proposed by the present invention can implement a feedback type neural network, the present invention performs a voltage prediction experiment by using battery discharge data provided by Baisuo Company. This battery data records the discharge data of a group of batteries every 10 seconds. Battery power is maintained near -29.1 watts. In this paper, the voltage, current and battery temperature in the battery data are used as the neural network training data, and the normalized conversion is performed according to equations (4.3) to (4.5) before inputting the network so that the input value is [0.2, 0.8). In the case, as shown in Figures 47 to 49, the graph on the left is the data graph of the original data, and the graph on the right is the data graph after normalization. Table 10 describes the network parameters of the experiment. The initial weight is a random number randomly selected between [-0.5, 0.5]. The present invention uses the online learning method to input a set of data and start training the network until the set conditions. The training stop condition set by the present invention is training 1000 times. After the training is completed, the network inputs a second group of data to predict the voltage value of the third group. The hardware network then uses the second set of data training networks to input the third set of data to predict the voltage value of the fourth group. The experiment is not completed until the training samples are trained. Figure 50 is a normalized discharge curve for hardware network prediction. The predicted voltage value in the figure is compared with the actual voltage map. It can be observed that the predicted voltage value is slightly larger than the actual voltage value. Figure 51 is the error map of the two discharge curves. The more the prediction of the voltage is, the later the error is. small. Table 11 is the result data of the experiment. The training time of each group of training materials takes an average of 0.0047 seconds and then predicts the voltage value after 10 seconds.

ν=1.7e-4*(ν-8856)+0.2,　(4.3)ν=1.7 e -4*(ν-8856)+0.2, (4.3)

i=7.41656e-4*(i+3200)+0.2,　(4.4) i =7.41656 e -4*( i +3200)+0.2, (4.4)

t=4.1958e-2*(t-26.25)+0.2。　(4.5) t = 4.1958 e -2*( t -26.25) + 0.2. (4.5)

4.3.3 Pansy classification experiment

本發明使用蝴蝶花以實數表示的四種資料，花萼長度、花萼寬度、花瓣長度及花瓣寬度訓練本文所提出的硬體類神經網路，學習的目標是分辨蝴蝶花的三大亞種(Iris Setosa、Iris Versicolor及Iris Virginica)，實驗的目標數據，分別由{0,0,1},{0,1,0},{1,0,0}代表三種類別。透過NIOS II下指令，硬體合成一個4*4*5*3的四層倒傳遞類神經網路，實驗參數如表十二所示。蝴蝶花的資料總共75組，35組資料當訓練資料，40組資料當測試資料。硬體網路經由逐次訓練和倒傳遞學習演算法調整權重和偏權值。表十三是實驗結果分析表。在回憶模式中，訓練資料的辨識率都是100%，測試資料的Iris Setosa及Iris Versicolor的辨識率也是100%但Iris Virginica的辨識率則只有92.86%。主要原因是Iris Virginica跟Iris Versicolor的部分資料非常相似，以致於系統將Iris Virginica誤判成Iris Versicolor。The invention uses the four kinds of data represented by real numbers of the butterfly, the length of the flower bud, the width of the flower bud, the length of the flower petal and the width of the flower petal to train the hardware neural network proposed in the paper, and the learning goal is to distinguish the three subspecies of the butterfly flower (Iris). Setosa, Iris Versicolor, and Iris Virginica), the target data of the experiment, represented by {0,0,1}, {0,1,0}, {1,0,0} represent three categories. Through the NIOS II instructions, the hardware is synthesized into a 4*4*5*3 four-layer inverted transfer neural network. The experimental parameters are shown in Table 12. There are a total of 75 groups of information on the pansies. 35 sets of data are used as training materials, and 40 sets of data are used as test data. The hardware network adjusts weights and bias values via successive training and reverse transfer learning algorithms. Table 13 is an analysis table of experimental results. In the recall mode, the recognition rate of the training data is 100%, the identification rate of Iris Setosa and Iris Versicolor of the test data is also 100%, but the recognition rate of Iris Virginica is only 92.86%. The main reason is that Iris Virginica is very similar to some of Iris Versicolor's data, so that the system misidentified Iris Virginica as Iris Versicolor.

上列詳細說明係針對本發明之一可行實施例之具體說明，惟該實施例並非用以限制本發明之專利範圍，凡未脫離本發明技藝精神所為之等效實施或變更，均應包含於本案之專利範圍中。The detailed description of the preferred embodiments of the present invention is intended to be limited to the scope of the invention, and is not intended to limit the scope of the invention. The patent scope of this case.

綜上所述，本案不但在空間型態上確屬創新，並能較習用物品增進上述多項功效，應已充分符合新穎性及進步性之法定發明專利要件，爰依法提出申請，懇請　貴局核准本件發明專利申請案，以勵發明，至感德便。In summary, this case is not only innovative in terms of space type, but also can enhance the above-mentioned multiple functions compared with the customary items. It should fully meet the statutory invention patent requirements of novelty and progressiveness, and apply for it according to law. This invention patent application, in order to invent invention, to the sense of virtue.

圖一　先前技術一示意圖Figure 1 Schematic diagram of prior art

圖二　前向運算架構圖Figure 2 Forward operation architecture diagram

圖三　分段計算流程圖Figure 3 Segmentation calculation flow chart

圖四　逆向逆向運算架構圖Figure 4 Reverse reverse operation architecture diagram

圖五　運算單元內部硬體架構圖Figure 5 Internal hardware diagram of the arithmetic unit

圖六　定點小數表示圖Figure 6 Fixed-point decimal representation

圖七　傳統內插法與截線內插法示意圖Figure 7. Schematic diagram of traditional interpolation and intercept interpolation

圖八　片段線性法硬體架構圖Figure 8 Fragment linear method hardware architecture diagram

圖九　片段線性法示意圖Figure 9 Schematic diagram of the fragment linear method

圖十　改良式片段線性法的雙彎曲函數硬體架構圖Figure 10: Double-bend function hardware architecture diagram of improved segment linear method

圖十一　雙曲線正切函數硬體架構圖Figure 11 Hyperbolic tangent function hardware architecture diagram

圖十二　雙彎曲函數的對稱性質圖Figure 12 Symmetrical nature of the double bending function

圖十三　雙曲線正切函數的對稱性質圖Figure 13 Symmetrical nature of the hyperbolic tangent function

圖十四　雙彎曲函數對稱性質硬體架構圖Figure 14 Double-bending function symmetrical nature hardware architecture diagram

圖十五　雙曲線正切函數對稱性質硬體架構圖Figure 15 Hyperbolic tangent function symmetry property hardware architecture diagram

圖十六　權重記憶體位址管理單元硬體區塊圖Figure 16 Weight memory address management unit hardware block diagram

圖十七　改良型Elman類神經網路1-3-2架構圖Figure 17 Improved Elman-like neural network 1-3-2 architecture

圖十八　逆向運算權重記憶體位址管理單元狀態圖Figure 18: Reverse operation weight memory address management unit state diagram

圖十九　權重記憶體資料配置圖Figure 19 Weight memory data configuration diagram

圖二十　運算單元記憶體儲值圖Figure XX Operation unit memory stored value map

圖二十一　運算單元權重值配置圖(一)Figure 21: Operation unit weight value configuration diagram (1)

圖二十二　運算單元權重值配置圖(二)Figure 22: Operation unit weight value configuration diagram (2)

圖二十三　目標值佇列記憶體配置圖Figure 23 Target value queue memory configuration diagram

圖二十四　標值堆疊記憶體配置圖Figure 24 Quad Value Stack Memory Configuration Diagram

圖二十五　控制單元有限狀態機圖Figure 25: Control unit finite state machine diagram

圖二十六　系統管理單元流程圖Figure 26 System Management Unit Flowchart

圖二十七　硬體及Nios II架構圖Figure 27 Hardware and Nios II architecture diagram

圖二十八　軟體規劃流程圖Figure 28 Software planning flow chart

圖二十九　網路大小1*3*4的三層倒傳遞類神經網路圖Figure 29: Network size 1*3*4 three-layer inverted transmission neural network diagram

圖三十　網路大小1*4*1的回饋型類神經網路圖Figure 30 Network size 1*4*1 feedback type neural network diagram

圖三十一　權重寫入運算單元的波形圖(一)Figure 31. Waveform of the weight writing operation unit (1)

圖三十二　權重寫入運算單元的波形圖(二)Figure 32. Waveform of the weight writing operation unit (2)

圖三十三　權重寫入運算單元的波形圖(三)Figure 33. Waveform of the weight writing operation unit (3)

圖三十四　活化函數區塊輸出的結果圖Figure 34 Results of the output of the activation function block

圖三十五　計算輸出層誤差的訊號來源圖Figure 35. Signal Source Diagram for Calculating Output Layer Error

圖三十六　新的權重值波形圖(一)Figure 36: New weight value waveform (1)

圖三十七　新的權重值波形圖(二)Figure 37. New weight value waveform (2)

圖三十八　新的權重值波形圖(三)Figure 38: New weight value waveform (3)

圖三十九　新的權重值波形圖(四)Figure 39: New weight value waveform diagram (4)

圖四十　決定隱藏層和輸出層神經元是否回饋的訊號示意圖Figure 40: Signals that determine whether the hidden and output layer neurons are fed back.

圖四十一　運算單元輸出的集成函數示意圖Figure 41 Schematic diagram of the integrated function of the output of the arithmetic unit

圖四十二　活化函數區塊輸出的結果圖(一)Figure 42. Results of the output of the activation function block (1)

圖四十三　活化函數區塊輸出的結果圖(二)Figure 43. Results of the output of the activation function block (2)

圖四十四　正弦函數實際曲線與硬體擬合曲線對照圖Figure 44: Sinusoidal function actual curve and hardware fitting curve comparison chart

圖四十五　訓練資料的網路輸出值與目標值的誤差分析Figure 45. Error analysis of network output value and target value of training data

圖四十六　測試資料的網路輸出值與目標值的誤差分析Figure 46 Error analysis of network output value and target value of test data

圖四十七　電池電壓圖與正規化後的電壓數據圖Figure 47: Battery voltage diagram and voltage data after normalization

圖四十八　電池電流圖與正規化後的電流數據圖Figure 48 Battery current diagram and normalized current data

圖四十九　電池溫度圖與正規化後的溫度數據圖Figure 49. Battery temperature map and normalized temperature data

圖五十　回饋型類神經網路預測電池放電曲線比較圖Figure 50 Comparison of battery discharge curves predicted by feedback-like neural networks

圖五十一　回饋型類神經網路預測電池放電曲線誤差圖Figure 51: Feedback-type neural network predicts battery discharge curve error map

Claims

A high-speed hardware reverse transfer and feedback type neural network with a free feedback node, wherein the network operation mode includes: a four-layer reverse transfer type neural network mode; and a free choice feedback node neural network mode.

For example, in the operation mode described in claim 1, the content of the software is determined by the content of the software update register.

A high-speed hardware reverse transfer and feedback type neural network with a free feedback node, the hardware structure of which includes: an input device, which is a device or a microprocessor capable of generating a digital signal; a programmable hard The programmable hardware is interfaced with the input device, receives data of the input device and performs arithmetic processing, the programmable hardware is an element programmable logic gate array; and an output device, the output The device is interfaced with the programmable hardware and receives data calculated by the programmable hardware. The output device is a device or a microprocessor that can receive digital signals.

For example, the operation mode described in claim 1 wherein the freely-selective feedback node neural network mode comprises: a connection input layer; a first feedback processing layer; a multiplexer; a hidden layer; and an output The first feedback processing layer receives the hidden layer output of the previous iteration when inputting the data, and then transmits the received input data to the hidden layer of the iteration; the weight value between the feedback processing layer and the hidden layer is fixed. The multiplexer controls the nodes in the inner layer of the feedback, so that the network can freely select the node to be fed back, or choose not to give feedback, and the node that does not give feedback has a link weight of 0.

The operating mode of claim 1, wherein the freely-selective feedback node neural network mode comprises: a second feedback processing layer; a multiplexer; a hidden layer; and an output layer; The feedback processing layer receives the output layer output of the previous iteration and inputs the data, and then transmits the received input data to the output layer of the iteration; the weight value between the feedback processing layer and the output layer is fixed; the multiplexer control The nodes in the inner layer of the feedback system enable the network to freely select the nodes to be fed back, or choose not to give feedback. The nodes that do not give feedback have a link weight of 0; the input data of the hidden layer comes from the first feedback processing layer. After connecting with the input layer, the multiplication of the weight values is performed, and the result is output to the output layer through the activation function of the hidden layer; the output layer receives the output value of the hidden layer and the second feedback processing layer, and the multiplication of the weight values is added. The output of the output layer is obtained through the activation function of the output layer.

The operation mode described in claim 1, wherein the four-layer reverse transfer type neural network mode includes: an input layer; a first hidden layer; a second hidden layer; and an output layer; Receiving the external signal, inputting the data, and then transmitting the received input data to the first hidden layer; the input data of the first hidden layer is multiplied by the weight value, and then outputting the result through the activation function of the first hidden layer Giving the second hidden layer; the second hidden layer receives the multiplication of the output value of the first hidden layer by the weight value, and outputs the result to the output layer through the activation function of the second hidden layer; After the multiplication of the weight values of the layers, the output of the output layer is obtained through the activation function of the output layer.

As described in claim 4 or 5, in any of the freely-selective feedback node neural network modes, the content of the control software update register can be used to determine the activation function used by the hidden layer and the output layer, respectively.

For example, in claim 4 or 5, in any of the freely-selective feedback node neural network modes, the content of the control software update register may be used to determine the use between the feedback processing layer and the hidden layer, respectively. Initial weight value.

For example, the four-layer inverted transfer neural network mode described in claim 6 can use the content of the control software update register to determine the activation function used by the first hidden layer, the second hidden layer, and the output layer, respectively. .

The freely selective feedback node neural network mode as described in claim 7, wherein the activation function is a double bending function or a double bending tangent function.

The four-layer inverted transfer neural network mode as described in claim 9 wherein the activation function is a double bending function or a double bending tangent function.