DE112008002158T5

DE112008002158T5 - Method and system for multiplying large numbers

Info

Publication number: DE112008002158T5
Application number: DE112008002158T
Authority: DE
Inventors: Vincent Dupaquis; Russell Hobson
Original assignee: Atmel Corp
Current assignee: Rambus Inc
Priority date: 2007-08-10
Filing date: 2008-08-08
Publication date: 2010-06-17
Anticipated expiration: 2028-08-09
Also published as: US8028015B2; TWI438678B; CN101790718A; CN101790718B; TW200915174A; US20090043836A1; DE112008002158B4; WO2009023595A1

Abstract

Computerimplementiertes Verfahren zum Betreiben einer Multiplikationsschaltung zur Berechnung des Produktes aus zwei Operanden (A und B), von denen wenigstens einer breiter als eine der Multiplikationsschaltung zugeordnete Breite ist, wobei jeder der Operanden einen oder mehrere aneinander angrenzende, geordnete, wortbreite Operandensegmente (A_j und B_i) umfasst, die durch spezifische Gewichtungen j und i charakterisiert sind, wobei j eine ganze Zahl von 0 bis k und i eine ganze Zahl von 0 bis m ist und ein Wort eine spezifizierte Anzahl von Bits (n) ist, und wobei die Multiplikationsschaltung eine Matrix von wortbreiten Operandensegmentpaarmultiplikationsoperationen ausführt, wobei die Matrix m + 1 Reihen und k + m + 2 Spalten umfasst, wobei jede Reihe eine Gewichtung x und jede Spalte eine Gewichtung y aufweist und die Multiplikationsschaltung Zugriff auf einen Speicher hat, wobei das Verfahren umfasst:
Durchführen von Multiplikationsoperationen jeweils an einem Paar von Reihen, wobei für jedes Paar von Reihen ein Paar...A computer-implemented method of operating a multiplier circuit for computing the product of two operands (A and B), at least one width of which is wider than one of the multiplication circuits, each of the operands including one or more contiguous ordered word-wide operand segments (A _j and B _i ) characterized by specific weights j and i, where j is an integer from 0 to k and i is an integer from 0 to m and one word is a specified number of bits (n), and wherein Multiplier circuit executes a matrix of word-wide operand segment pair multiplication operations, wherein the matrix comprises m + 1 rows and k + m + 2 columns, each row having a weight x and each column having a weight y and the multiplier circuit having access to a memory, the method comprising :
Performing multiplication operations on each pair of rows, with a pair of rows for each pair of rows ...

Description

Technisches GebietTechnical area

Die Erfindung betrifft die Informationsverarbeitung.The The invention relates to information processing.

Hintergrundbackground

Herkömmliche Multiplikationshardware, beispielsweise in einer Festkörpervorrichtung, kann Größenbeschränkungen mit sich bringen, so beispielsweise eine spezifizierte Anzahl von Bits, die gleichzeitig von der Hardware verarbeitet werden können. Typischerweise ist die Multiplikationshardware derart definiert, dass sie ein Paar von Einzelwortoperandeneingaben und eine Zweiwortergebnisausgabe aufweist. Zum Ausführen von Multiplikationsakkumulationsoperationen kann die Multipliziererausgabe mit einem Akkumulator verbunden sein, der typischerweise wenigstens zwei Worte plus ein Bit breit ist. Das zusätzliche Bit kann Teil des Ergebnisses oder auch einfach nur als Trägerinformation vorhanden sein, die entweder einen Overflow für den Fall der Addition oder einen Underflow für den Fall einer Subtraktion in dem Akkumulationsteil der Operation angibt.conventional Multiplication hardware, for example in a solid state device, can size limitations with, for example, a specified number of Bits that can be processed simultaneously by the hardware. typically, the multiplication hardware is defined to be a pair of single-word operand entries and a two-word result output having. To run of multiplication accumulation operations, the multiplier output be connected to an accumulator, which typically at least two words plus one bit wide. The extra bit can be part of the result or just as a carrier information available be either an overflow in the case of addition or an underflow for the case of a subtraction in the accumulation part of the operation indicates.

In der Kryptographie und bei anderen Anwendungen besteht Bedarf an der Multiplikation sehr großer ganzer Zahlen, darunter einer großen Anzahl von Worten. Um diese Operationen unter Verwendung von Operanden, die viel breiter als die Multiplikationshardware sind, durchzuführen, können die Operanden in 1-wortbreite Segmente unterteilt und der Hardware in einer spezifizierten Sequenz zugeführt werden. Die Segmente werden derart verarbeitet, und die Zwischenergebnisse werden derart akkumuliert, dass das Endprodukt als Summe von Kreuzprodukten verschiedener Gewichtungen berechnet wird. Die wortbreiten Operandensegmente wie auch die Teilergebnisse werden in einem Speicher gespeichert, auf den durch einen Operationsequenzierer der Multipliziererhardware zugegriffen wird. So kann beispielsweise eine Sequenz ein erstes Operandensegment konstant halten, während die Operandensegmente Wort für Wort in den Multiplizierer abgetastet werden, woraufhin der erste Operrand zu dem nächsten wortbreiten Segment inkrementiert und die Abtastung des zweiten Operanden wiederholt wird.In Cryptography and in other applications there is a need the multiplication very big whole numbers, including a large number of words. Around Operations using operands that are much wider than If the multiplication hardware is to perform, the operands can be in 1 word width Segments divided and the hardware in a specified sequence supplied become. The segments are processed in this way, and the intermediate results are accumulated in such a way that the end product is the sum of cross products different weights is calculated. The word-wide operand segments as well as the partial results are stored in a memory, by an operation sequencer of the multiplier hardware is accessed. For example, a sequence may have a first one Keep the operand segment constant while the operand segments Word for Be sampled in the multiplier, whereupon the first Opera edge to the next word-width segment is incremented and the sampling of the second Operands is repeated.

ZusammenfassungSummary

Die Erfindung betrifft die Multiplikation großer Zahlen. Im Allgemeinen stellt die Erfindung gemäß einem Aspekt ein computerimplementiertes Verfahren sowie ein solches System und ein Computerprogrammerzeugnis zum Betreiben einer Multiplikationsschaltung zur Berechnung des Produktes aus zwei Operanden (A und B) bereit, von denen wenigstens einer breiter als eine Breite ist, die der Multiplikationsschaltung zugeordnet ist. Jeder der Operanden beinhaltet einen oder mehrere aneinander angrenzende, geordnete, wortbreite Operandensegmente (A_j und B_i), die durch spezifische Gewichtungen j und i charakterisiert sind, wobei j eine ganze Zahl von 0 bis k und i eine ganze Zahl von 0 bis m ist und ein Wort eine spezifizierte Anzahl von Bits (n) ist. Die Multiplikationsschaltung führt eine Matrix von wortbreiten Operandensegmentpaarmultiplikationsoperationen aus, wobei die Matrix m + 1 Reihen und k + m + 2 Spalten beinhaltet, wobei jede Reihe eine Gewichtung x und jede Spalte eine Gewichtung y aufweist. Die Multiplikationsschaltung hat Zugriff auf einen Speicher. Multiplikationsoperationen werden jeweils an einem Paar von Reihen durchgeführt. Für jedes Paar von Reihen wird ein Paar von entsprechenden B_i-wortbreiten Operandensegmenten aus dem Speicher gelesen, und es werden wortbreite Operandensegmentpaarmultiplikationsoperationen (A_j·B_i) iterativ für jede von k + 2 Spalten durchgeführt, sodass für jede Spalte in der Matrix ein Maximum von zwei zusätzlichen Speicherleseoperationen und einer Speicherschreiboperation erforderlich ist. Es sind zudem andere Implementierungen offenbart.The invention relates to the multiplication of large numbers. In general, in one aspect, the invention provides a computer-implemented method and system and computer program product for operating a multiplier circuit for computing the product of two operands (A and B), at least one of which is wider than a width associated with the multiplier circuit is. Each of the operands includes one or more contiguous, ordered, word-wide operand segments (A _j and B _i ) characterized by specific weights j and i, where j is an integer from 0 to k and i is an integer from 0 to m is and a word is a specified number of bits (n). The multiplication circuit executes a matrix of word-wide operand segment-pair multiplication operations, the matrix including m + 1 rows and k + m + 2 columns, each row having a weight x and each column having a weight y. The multiplication circuit has access to a memory. Multiplication operations are each performed on a pair of rows. For each pair of rows, a pair of corresponding B _i word-wide operand segments are read from memory, and word-wide operand segment pair multiplication operations (A _j * B _i ) are iteratively performed for each of k + 2 columns, such that for each column in the matrix Maximum of two additional memory read operations and one memory write operation is required. In addition, other implementations are disclosed.

Implementierungen der Erfindung können einen oder mehrere der nachfolgenden Vorteile mit sich bringen. Die beschriebene Multiplikationsschaltung kann jeweils ein Paar von Reihen berechnen, während sie nur drei Speicherzugriffe (zweimal lesen und einmal schreiben) pro Spalte (über die anfänglichen Lesevorgänge der wortbreiten Operandensegmente entsprechend jeder Reihe hinausgehend) benötigt, wodurch es möglich wird, eine effizientere Speicherschnittstelle als einzelner Dualport-RAM oder als zwei Einzelport-RAMs auszugestalten. Ein weiterer Vorteil besteht darin, dass die Paare von Reihen aus der Sequenz berechnet werden können. Das zufallsbasierte Erstellen der Reihenfolge der Reihenberechnungen kann einen verbesserten Schutz von schutzwürdigen Daten liefern, die bei den Berechnungen verwendet werden. Der Energieverbrauch durch die Multiplikationsschaltung kann niedriger als bei anderen herkömmlichen Schaltungen sein, was von weniger Speicherzugriffen herrührt.implementations of the invention one or more of the following advantages. The multiplication circuit described can each be a pair calculate from rows while they only have three memory accesses (read twice and write once) per column (over the initial ones reads the word-wide operand segments corresponding to each row) needed making it possible will provide a more efficient memory interface than single dual-port RAM or as two single-port RAMs to design. Another advantage exists in that the pairs of rows are calculated from the sequence can. The Random-based creation of the order of row calculations can provide enhanced protection of sensitive data that is included in the to be used in the calculations. Energy consumption by the Multiplication circuit may be lower than other conventional ones Circuits, which results from fewer memory accesses.

Details eines oder mehrerer Ausführungsbeispiele der Erfindung sind in der nachfolgenden Zeichnung und der nachstehenden Beschreibung niedergelegt. Weitere Merkmale, Aufgaben und Vorteile der Erfindung erschließen sich aus der Beschreibung und Zeichnung sowie aus den Ansprüchen.details one or more embodiments The invention are described in the following drawing and the following Description laid down. Other features, tasks and benefits to open the invention from the description and drawing as well as from the claims.

Kurzbeschreibung der ZeichnungBrief description of the drawing

1 ist ein Blockdiagramm, das ein Beispiel einer Multiplikationsschaltung zeigt. 1 Fig. 10 is a block diagram showing an example of a multiplication circuit.

2 ist eine schematische Darstellung einer Multiplikationsmatrix eines ersten Beispieles. 2 is a schematic representation ei ner multiplication matrix of a first example.

3 ist ein Flussdiagramm, das einen Beispielsprozess zur Berechnung eines Produktes aus zwei Operanden entsprechend der Multiplikationsmatrix von 2 zeigt. 3 FIG. 4 is a flowchart illustrating an example process for calculating a product of two operands according to the multiplication matrix of FIG 2 shows.

4 ist eine schematische Darstellung einer Multiplikationsmatrix eines zweiten Beispieles. 4 is a schematic representation of a multiplication matrix of a second example.

Gleiche Bezugszeichen in verschiedenen Figuren bezeichnen gleiche Elemente.Same Reference numerals in different figures indicate like elements.

Detailbeschreibungdetailed description

Bestimmte Anwendungen erfordern das wechselseitige Multiplizieren von Zahlen, die größer als eine Maschinengröße der Hardware sind, die zur Berechnung des Ergebnisses verwendet wird. Bei einem demonstrativen Beispiel kann ein Mikroprozessor mit einer Maschinengröße von 32 Bit erforderlich sein, um das Ergebnis einer Multiplikation mit 128-Bit-Eingabeoperanden zu berechnen. Da die Eingabedaten größer als die Maschinengröße des Mikroprozessors sind, können die Eingabedaten in einem RAM oder einem anderen ähnlichen temporären Ablagespeicher gespeichert werden, oder sie können sich in Caches oder Registern befinden, die innerhalb des Mikroprozessors angeordnet sind. Sind zwei 128-Bit-Eingabeoperanden A und B, die in einem RAM gespeichert sind, gegeben und sollen diese von einem 32-Bit-Mikroprozessor berechnet werden, wobei
A = 0x11111111222222223333333344444444; und
B = 0x55555555666666667777777788888888
(mit x0 zur Bezeichnung einer hexadezimalen Zahl)
gilt, so kann die Berechnung folgendermaßen in Worte in Maschinengröße, in diesem Beispiel in 32-Bit-wortbreite Operandensegmente, aufgeteilt werden: A = A0 + A1·232 + A2·264 + A3·296; und B = B0 + B1·232 + B2·264 + B3·296 wobei:
A₀ = 0x44444444; A₁ = 0x33333333; A₂ = 0x22222222; A₃ = 0x11111111; und
B₀ = 0x88888888; B₁ = 0x77777777; B₂ = 0x66666666; B₃ = 0x55555555Certain applications require multiplying multiples that are larger than a machine size of the hardware used to calculate the result. In a demonstrative example, a microprocessor with a 32-bit machine size may be required to compute the result of multiplication by 128-bit input operands. Since the input data is larger than the machine size of the microprocessor, the input data may be stored in RAM or other similar temporary storage or may be in caches or registers located within the microprocessor. Given two 128-bit input operands A and B, which are stored in a RAM, and are to be calculated by a 32-bit microprocessor, wherein
A = 0x11111111222222223333333344444444; and
B = 0x55555555666666667777777788888888
(with x0 to denote a hexadecimal number)
the calculation can be divided into words in machine size, in this example in 32-bit word-wide operand segments, as follows: A = A 0 + A 1 · 2 32 + A 2 · 2 64 + A 3 · 2 96 ; and B = B 0 + B 1 · 2 32 + B 2 · 2 64 + B 3 · 2 96 in which:
A ₀ = 0x44444444; A ₁ = 0x33333333; A ₂ = 0x22222222; A ₃ = 0x11111111; and
B ₀ = 0x88888888; B ₁ = 0x77777777; B ₂ = 0x66666666; B ₃ = 0x55555555

Die Berechnung macht bei jedem 32-Bit-wortbreiten Operandensegment des ersten Operanden A weiter, der mit jedem der wortbreiten Operandensegmente in dem anderen, zweiten Operanden B multipliziert wird. Nachfolgend werden eine Multiplikationsschaltung und ein Prozess zum Betreiben der Multiplikationsschaltung beschrieben, mit dem die Anzahl von Lese- und Schreibzugriffen auf einen Speicher, der die Operanden A und B beinhaltet, verringert werden kann, wodurch eine effizientere Speicherschnittstelle bereitgestellt wird.The Computation does with every 32-bit word-wide operand segment of the first operand A, associated with each of the word-wide operand segments in the other, second operand B is multiplied. following become a multiplication circuit and a process for operating the multiplication circuit with which the number of Read and write access to a memory containing the operands A and B includes, can be reduced, creating a more efficient storage interface provided.

Beispielsystem für die MultiplikationsschaltungExample system for the multiplication circuit

In 1 ist ein System 100 gezeigt, das eine Multiplikationsschaltung 102 beinhaltet. Das System 100 kann ein Produkt aus zwei Operanden berechnen, die breiter als die Multiplikationsschaltung 102 sein können. So kann die Multiplikationsschaltung 102 beispielsweise unter Verwendung einer 32-Bit-breiten Hardware implementiert sein, während die Operanden gleich 64 Bit, 128 Bit, 1024 Bit oder gleich einer beliebigen anderen Anzahl von Bits von mehr als 32 Bit sein können. Das System 100 ist dafür ausgelegt, das Produkt der Operanden effizient durch Verringern der Anzahl von Speicheroperationen zu berechnen, während eine verringerte Größe der Hardware realisiert ist.In 1 is a system 100 shown that a multiplication circuit 102 includes. The system 100 can compute a product of two operands wider than the multiplication circuit 102 could be. So the multiplication circuit 102 for example, using a 32-bit wide hardware while the operands may be equal to 64 bits, 128 bits, 1024 bits or any other number of bits greater than 32 bits. The system 100 is designed to efficiently calculate the product of the operands by reducing the number of memory operations while realizing a reduced size of the hardware.

Die Multiplikationsschaltung 102 beinhaltet einen Cache 104, einen Multiplizierer 106 und einen Akkumulator 108. Der Multiplizierer 106 beinhaltet Eingaben aus dem Cache 104. Die Eingaben können beispielsweise wortbreite Operandensegmente der Multiplikationsoperanden sein. Bei einigen Implementierungen kann der Multiplizierer 106 zwei wortbreite Operandensegmente zur Bildung eines wenigstens 2-wortbreiten Zwischenproduktes multiplizieren. Der Multiplizierer 106 kann beispielsweise zwei 32-Bit- Operandensegmente multiplizieren, um wenigstens ein 64-Bit-Zwischenprodukt zu bilden. Ein Zwischenprodukt, das sich aus der wortbreiten Operandensegmentpaarmultiplikationsoperationen ergibt, wird von dem Akkumulator 108 empfangen.The multiplication circuit 102 includes a cache 104 , a multiplier 106 and an accumulator 108 , The multiplier 106 includes inputs from the cache 104 , The inputs can be, for example, word-wide operand segments of the multiplication operands. In some implementations, the multiplier 106 multiply two word-wide operand segments to form an at least 2-word-wide intermediate. The multiplier 106 For example, it may multiply two 32-bit operand segments to form at least one 64-bit intermediate. An intermediate resulting from the word-wide operand segment pair multiplication operations is taken from the accumulator 108 receive.

Der Akkumulator 108 empfängt Eingaben von dem Multiplizierer 106 und dem Cache 104. So kann der Akkumulator 108 beispielsweise das Zwischenprodukt aus dem Multiplizierer 106 und ein Zwischenspaltenergebnis aus dem Cache 104 empfangen. Der Akkumulator beinhaltet ein Akkumulationsregister 110. Bei bestimmten Implementierungen kann das Akkumulationsregister 110 ein temporärer Datenspeicher von wenigstens 2n + 2 Bit (wobei n die Anzahl von Bits in einem Wort ist) sein. Bei anderen Implementierungen kann das Akkumulationsregister 110 in Größen implementiert sein, die breiter als 2n + 2 Bit sind.The accumulator 108 receives inputs from the multiplier 106 and the cache 104 , So can the accumulator 108 for example, the intermediate from the multiplier 106 and an intermediate column result from the cache 104 receive. The accumulator includes an accumulation register 110 , In certain implementations, the accumulation register may 110 a temporary data store of at least 2n + 2 bits (where n is the number of bits in a word). In other implementations, the accumulation register 110 be implemented in sizes that are wider than 2n + 2 bits.

Bei einigen Implementierungen kann der Akkumulator 108 zur Akkumulation der Eingaben in dem Akkumulationsregister 110 ausgelegt sein. So kann der Akkumulator 108 beispielsweise das Zwischenprodukt (aus dem Multiplizierer 106) zu einem Wert addieren, der in dem Akkumulationsregister 110 gespeichert ist. Der Akkumulator 108 kann sodann das Akkumulationsergebnis in dem Akkumulationsregister 110 speichern. Der Akkumulator 108 kann auch dafür ausgelegt sein, einen Wert zurückzusetzen, der in dem Akkumulationsregister 110 gespeichert ist. So kann der Akkumulator 108 beispielsweise das Akkumulationsregister 110 auf 0 zurücksetzen. Bei einem weiteren Beispiel kann der Akkumulator 108 das Akkumulationsregister 110 auf ein oberes Wort des akkumulierten Wertes zurücksetzen.In some implementations, the accumulator may 108 to accumulate the entries in the accumulation register 110 be designed. So can the accumulator 108 for example, the intermediate product (from the multiplier 106 ) to a value stored in the accumulation register 110 is stored. The accumulator 108 then the accumulation result in the accumulation register 110 to save. The accumulator 108 can also be designed to reset a value zen, which is in the accumulation register 110 is stored. So can the accumulator 108 for example, the accumulation register 110 reset to 0. In another example, the accumulator 108 the accumulation register 110 to reset to an upper word of the accumulated value.

Der Akkumulator 108 ist mit einem Speicherelement verbunden, so beispielsweise einem Speicher mit wahlfreiem Zugriff 112 (RAM Random Access Memory). Bei einigen Implementierungen kann der Akkumulator 108 1-wortbreite Daten an den RAM 112 übertragen. Bei einer Beispielsoperation kann der Akkumulator 108 ein unteres Wort des akkumulierten Wertes extrahieren, damit dieses in den RAM 112 geschrieben wird. Anschließend kann der Akkumulator 108 das Akkumulationsregister 110 durch Speichern lediglich des oberen Wortes des akkumulierten Wertes in dem Akkumulationsregister 110 zurücksetzen.The accumulator 108 is connected to a memory element, such as a random access memory 112 (RAM Random Access Memory). In some implementations, the accumulator may 108 1-word-wide data to the RAM 112 transfer. In an example operation, the accumulator may 108 extract a lower word of the accumulated value, so that this in the RAM 112 is written. Subsequently, the accumulator 108 the accumulation register 110 by storing only the upper word of the accumulated value in the accumulation register 110 reset to default.

Der RAM 112 kann die Daten an den Cache 104 übertragen. Wie gezeigt ist, kann der RAM 112 zwei 1-wortbreite Operandensegmente an den Cache 104 übertragen. Da ein Multiplikationszyklus (nachstehend noch beschrieben) nur zwei Lesevorgänge aus dem RAM 112 benötigt, kann der RAM 112 zwei Speicherschnittstellen pro Spalte beinhalten. Bei einer Implementierung ist der RAM 112 ein Doppelport-RAM, während bei einer anderen Implementierung der RAM 112 ein Speichermodul mit zwei Einzelport-RAMs ist. Obwohl bei diesem Beispiel ein RAM verwendet wird, kann auch ein anderer Leseschreibspeicher, so beispielsweise ein Flash-Speicher, ein Nurlesezugriffsspeicher oder eine andere Lese-Schreibdatenspeichervorrichtung bei anderen Implementierungen zum Einsatz kommen.The RAM 112 can the data to the cache 104 transfer. As shown, the RAM 112 two 1-word-wide operand segments to the cache 104 transfer. Since one multiply cycle (to be described later) only two reads from the RAM 112 needed, the ram can 112 contain two memory interfaces per column. In one implementation, the RAM is 112 a dual port RAM, while in another implementation the RAM 112 is a memory module with two single port RAMs. Although RAM is used in this example, other read-write memory, such as flash memory, read-only access memory, or other read-write data storage device, may be used in other implementations.

Das System 100 beinhaltet eine Zustandsmaschine 114 zur Steuerung der Operationen der Multiplikationsschaltung 102. So kann die Zustandsmaschine 114 die Speicherzugriffe aus dem Cache 104 auf den RAM 112 steuern. Wie gezeigt ist, empfängt die Zustandsmaschine 114 Zustandssignale aus der Multiplikationsschaltung 102. Auf Grundlage der Zustandssignale kann die Zustandsmaschine 114 einen aktuellen Zustand der Multiplikationsschaltung 102 bestimmen. Auf Grundlage des aktuellen Zustandes kann die Zustandsmaschine 114 sodann Steuersignale an den RAM 112 und/oder die Multiplikationsschaltung 102 übertragen. Bei einem Beispiel kann die Zustandsmaschine 114 die Zustandssignale empfangen, um zu bestimmen, ob der Cache 104 mit dem Lesen der Operandensegmente aus dem RAM 112 fertig ist. Ist der Cache 104 mit dem Lesen des Operandensegmentes fertig, so kann die Zustandsmaschine 114 beispielsweise Steuersignale an die Multiplikationsschaltung 102 übertragen, um die Operandensegmente unter Verwendung des Multiplizierers 106 zu multiplizieren. Einige Beispiele für Operationssequenzen der Zustandsmaschine 114 sind nachstehend unter Bezugnahme auf 3 beschrieben.The system 100 includes a state machine 114 for controlling the operations of the multiplication circuit 102 , So can the state machine 114 the memory accesses from the cache 104 on the RAM 112 Taxes. As shown, the state machine receives 114 Status signals from the multiplication circuit 102 , Based on the state signals, the state machine 114 a current state of the multiplication circuit 102 determine. Based on the current state, the state machine can 114 then control signals to the RAM 112 and / or the multiplication circuit 102 transfer. In one example, the state machine 114 receive the status signals to determine if the cache 104 with reading the operand segments from the RAM 112 is done. Is the cache 104 finished reading the operand segment, so can the state machine 114 for example, control signals to the multiplication circuit 102 to transfer the operand segments using the multiplier 106 to multiply. Some examples of operation sequences of the state machine 114 are below with reference to 3 described.

Bei einigen Implementierungen kann die Zustandsmaschine 114 eine digitale logische Schaltung sein (so beispielsweise ein feldprogrammierbares Gatearray (FPGA), eine anwendungsspezifische integrierte Schaltung (ASIC), diskrete digitale Schaltungskomponenten oder auch eine Kombination hieraus), die eine Hardwarezustandsmaschinenstruktur zur Steuerung der Multiplikationsschaltung beinhaltet. Bei anderen Implementierungen kann die Zustandsmaschine 114 ein Computerprogramm sein, das von dem System 100 verarbeitet wird, um die Multiplikationsschaltung 102 zu betreiben. Bei einer weiteren Implementierung kann die Zustandsmaschine 114 ein Prozessor sein, der Softwareanweisungen ausführt, um Operationssequenzen zum Multiplizieren von zwei Operanden unter Verwendung der Multiplikationsschaltung 102 auszuführen.In some implementations, the state machine may 114 a digital logic circuit (such as a field programmable gate array (FPGA), an application specific integrated circuit (ASIC), discrete digital circuit components, or a combination thereof) including a hardware state machine structure for controlling the multiplier circuit. In other implementations, the state machine 114 a computer program that comes from the system 100 is processed to the multiplication circuit 102 to operate. In another implementation, the state machine may 114 a processor that executes software instructions to operate sequences to multiply two operands using the multiplier circuit 102 perform.

Bei verschiedenen Beispielen kann die Zustandsmaschine 114 ein Operationssequenzierer sein, der dafür ausgelegt ist, den Zugriff des RAM 112 durch den Cache 104 zu steuern und die Sequenz der Multiplizier- und Akkumulieroperationen, die von dem Multiplizierer 106 beziehungsweise dem Akkumulator 108 durchgeführt werden, zu steuern. Bei einem weiteren Beispiel kann die Zustandsmaschine 114 die Multiplikationsschaltung 102 anweisen, zwei wortbreite Operanden unter Verwendung des Multiplizierers 106 zu multiplizieren. So kann die Zustandsmaschine 102 zwei wortbreite Operanden aus dem Cache 104 zur Multiplikation durch den Multiplizierer 106 spezifizieren. Bei einem weiteren Beispiel kann die Maschine 114 die Multiplikationsschaltung 102 anweisen, ein Ergebnis aus dem Multiplizierer in dem Akkumulationsregister 110 zu akkumulieren. So kann der Akkumulator 108 beispielsweise das 2-wortbreite Multiplikationsergebnis mit einem bestehenden Wert in dem Akkumulationsregister 110 akkumulieren und das Akkumulationsergebnis in dem Akkumulationsregister 110 speichern. Bei einem weiteren Beispiel kann die Zustandsmaschine 114 die Multiplikationsschaltung 102 anweisen, ein Wort (beispielsweise ein oberes Wort oder ein unteres Wort) aus dem Akkumulationsregister 110 in dem RAM 112 zu speichern. Die Zustandsmaschine 114 kann beispielsweise bewirken, dass der Akkumulator 108 ein oberes Wort und/oder ein unteres Wort aus dem Akkumulationsregister 110 extrahiert und das extrahierte Datenwort beziehungsweise die extrahierten Datenworte in dem RAM 112 speichert. Bei einem weiteren Beispiel kann die Zustandsmaschine 114 die Multiplikationsschaltung 102 anweisen, das Akkumulationsregister 110 zurückzusetzen. So kann die Zustandsmaschine 114 veranlassen, dass der Akkumulator 108 ein oberes Wort aus dem Akkumulationsregister 110 extrahiert und das Akkumulationsregister 110 dahingehend, dass dieses das extrahierte obere Wort ist, zurücksetzen.In various examples, the state machine 114 an operational sequencer designed to control the access of the RAM 112 through the cache 104 and the sequence of multiply and accumulate operations performed by the multiplier 106 or the accumulator 108 be carried out to control. In another example, the state machine 114 the multiplication circuit 102 instruct two word-wide operands using the multiplier 106 to multiply. So can the state machine 102 two word-wide operands from the cache 104 for multiplication by the multiplier 106 specify. In another example, the machine may 114 the multiplication circuit 102 instruct a result from the multiplier in the accumulation register 110 to accumulate. So can the accumulator 108 For example, the 2-word-wide multiplication result with an existing value in the accumulation register 110 accumulate and the accumulation result in the accumulation register 110 to save. In another example, the state machine 114 the multiplication circuit 102 instruct a word (for example, an upper word or a lower word) from the accumulation register 110 in the RAM 112 save. The state machine 114 for example, can cause the accumulator 108 an upper word and / or a lower word from the accumulation register 110 extracted and the extracted data word or the extracted data words in the RAM 112 stores. In another example, the state machine 114 the multiplication circuit 102 instruct the accumulation register 110 reset. So can the state machine 114 cause the accumulator 108 an upper word from the accumulation register 110 extracted and the accumulation register 110 in that this is the extracted upper word, reset.

Illustratives Beispiel für den Betrieb der MultiplikationsschaltungIllustrative example of the operation the multiplication circuit

Anhand 2 werden nachstehend Operationen der Multiplikationsschaltung 102 anhand eines illustrativen Beispieles beschrieben, bei dem die Operanden A und B gemäß vorstehender Definition Verwendung finden. Jeder Operand A und B beinhaltet aneinander angrenzende, geordnete, wortbreite Operandensegmente A_j und B_i, so beispielsweise A₀–A₃ und B₀–B₃, siehe oben. Jedes wortbreite Operandensegment ist durch eine spezifische Gewichtung j oder i charakterisiert. Die Gewichtung j für jedes A-wortbreite Operandensegment ist eine ganze Zahl von 0 bis k, wobei k die maximale Gewichtung ist, die bei diesem Beispiel „3” beträgt. Die Gewichtung i für jedes B-wortbreite Operanden segment ist eine ganze Zahl von 0 bis m, wobei m die maximale Gewichtung ist, die in diesem Beispiel „3” beträgt.Based 2 hereinafter, operations of the multiplication circuit 102 described using an illustrative example in which the operands A and B according to the above definition are used. Each operand A and B includes contiguous, ordered, word-wide operand segments A _j and B _i , such as A ₀ -A ₃ and B ₀ -B ₃ , see above. Each word-wide operand segment is characterized by a specific weighting j or i. The weight j for each A word wide operand segment is an integer from 0 to k, where k is the maximum weight, which in this example is "3". The weight i for each B-word-wide operand segment is an integer from 0 to m, where m is the maximum weight, which in this example is "3".

2 ist eine schematische Darstellung einer Multiplikationsmatrix 200, die m + 1 Reihen und k + m + 2 Spalten beinhaltet. Jede Reihe weist eine Gewichtung x auf, während jede Spalte eine Gewichtung y aufweist. Bei diesem Beispiel reichen die Reihengewichtungen von 0 bis 3, während die Spaltengewichtungen von 0 bis 7 reichen. Die Multiplikationsoperationen werden jeweils an einem Paar von Reihen und nicht jeweils Reihe für Reihe durchgeführt. Für jedes Paar von Reihen wird ein Paar von entsprechenden B_i-wortbreiten Operandensegmenten aus dem Speicher gelesen, und es werden wortbreite Operandensegmentpaarmultiplikationsoperationen (A_j·B_i) iterativ für jede der Spalten durchgeführt, die Zellen beinhaltet, die für ein gegebenes Paar von Reihen bevölkert sind. Bei der Durchführung der Multiplikationsoperationen für jede Spalte ist als solches ein Maximum von zwei zusätzlichen Speicherleseoperationen und einer Speicherschreiboperation erforderlich. Vorteilhafterweise kann eine effiziente Speicherschnittstelle in dem System verwendet werden, so beispielsweise ein Doppelport-RAM oder zwei Einzelport-RAMs. 2 is a schematic representation of a multiplication matrix 200 that includes m + 1 rows and k + m + 2 columns. Each row has a weight x while each column has a weight y. In this example, the row weights range from 0 to 3, while the column weights range from 0 to 7. The multiplication operations are each performed on a pair of rows rather than row by row. For each pair of rows, a pair of corresponding B _i word-wide operand segments are read from memory, and word-wide operand segment pair multiplication operations (A _j * B _i ) are iteratively performed for each of the columns containing cells corresponding to a given pair of rows are populated. As such, performing the multiplication operations for each column requires a maximum of two additional memory read operations and one memory write operation. Advantageously, an efficient memory interface may be used in the system, such as a dual port RAM or two single port RAMs.

Die Zustandsmaschine 114 kann die Multiplikationsschaltung 102 zur Ausführung von wortbreiten Operandensegmentpaarmultiplikationsoperationen betreiben, die durch die Multiplikationsmatrix 200 dargestellt werden. Die Zustandsmaschine 114 wählt die Durchführung von Multiplikationsoperationen jeweils an einem Paar von Reihen in der Multiplikationsmatrix 200. Nachdem die Multiplikationsoperationen des Paares von Reihen beendet sind, wählt die Zustandsmaschine 114 ein weiteres Paar von Reihen aus, bis sämtliche wortbreiten Operandensegmentpaarmultiplikationen beendet sind.The state machine 114 can the multiplication circuit 102 for executing word-wide operand segment pair multiplication operations passing through the multiplication matrix 200 being represented. The state machine 114 chooses to perform multiplication operations on a pair of rows in the multiplication matrix, respectively 200 , After the multiplication operations of the pair of rows are completed, the state machine selects 114 another pair of rows until all word-wide operand segment-pair multiplications are completed.

Bei dem dargestellten Beispiel wählt die Zustandsmaschine 114 die Durchführung von wortbreiten Operandensegmentmultiplikationen in der Sequenz aus, wie in 2 durch Pfeile angedeutet ist. Wie gezeigt ist, kann die Zustandsmaschine 114 Reihen 202, 204 in der Multiplikationsmatrix 200 auswählen. Nachdem die Multiplikationsoperationen beendet sind, wählt die Zustandsmaschine 114 Reihen 206, 208 aus. Bei einer Implementierung wählt die Zustandsmaschine 114 ein Paar von Reihen entsprechend einer nummerischen Sequenz von zunehmenden oder abnehmenden Gewichtungswerten der Gewichtung für die Reihen, die in jeden Paar beinhaltet sind, aus. So kann die Zustandsmaschine 114 beispielsweise ein Paar von Reihen mit Gewichtungen t und t + 1 auswählen, gefolgt von einer Auswahl eines Paares von Reihen mit Gewichtungen t + 2 und t + 3.In the illustrated example, the state machine selects 114 performing word-wide operand segment multiplications in the sequence, as in 2 indicated by arrows. As shown, the state machine can 114 string 202 . 204 in the multiplication matrix 200 choose. After the multiplication operations are completed, the state machine selects 114 string 206 . 208 out. In one implementation, the state machine selects 114 a pair of rows corresponding to a numerical sequence of increasing or decreasing weighting values of weighting for the rows included in each pair. So can the state machine 114 For example, select a pair of rows with weights t and t + 1, followed by a selection of a pair of rows with weights t + 2 and t + 3.

Bei einer anderen Implementierung kann die Auswahl von Reihenpaaren jedoch auch zufällig erfolgen. Das zufallsbasierte Bilden der Sequenz von Reihenpaarberechnungen kann eine verbesserte Sicherheit von Daten gewährleisten, die bei den Multiplikationsoperationen verwendet werden.at another implementation may be the selection of row pairs but also by chance respectively. The random-based formation of the sequence of row pair calculations can ensure improved security of data during multiplication operations be used.

Wie wiederum in 2 gezeigt ist, kann beim Auswählen der Reihen 202, 204 die Zustandsmaschine 114 die Multiplikationsschaltung 102 anweisen, ein Paar von entsprechenden B-wortbreiten Operandensegmenten zu lesen. Dies bedeutet, dass B-wortbreite Operandensegmente mit entsprechenden Gewichtungswerten in den Cache 104 gelesen werden, die in diesem Beispiel B₀ und B₁ sind. Die Zustandsmaschine 114 kann die Zellen in den Reihen 202, 204 durch iteratives Durchführen von wortbreiten Operandensegmentpaarmultiplikationsoperationen für jede Spalte von der Gewichtung 0 bis zur Gewichtung 4 durchführen.As again in 2 can be shown when selecting the rows 202 . 204 the state machine 114 the multiplication circuit 102 instructing to read a pair of corresponding B-word wide operand segments. This means that B-word-wide operand segments with corresponding weighting values in the cache 104 which are B ₀ and B ₁ in this example. The state machine 114 can cells in rows 202 . 204 by iteratively performing word-wide operand segment pair multiplication operations for each column from 0 weight to 4 weight.

Bei einem Beispiel kann die Zustandsmaschine 114 zunächst den Cache 104 anweisen, ein Operandensegment A₀ aus dem RAM 112 zu lesen. Anschließend kann die Zustandsmaschine 114 die Multiplikationsschaltung 102 anweisen, ein Zwischenprodukt von A₀·B₀ zu berechnen. Die Multiplikationsschaltung 102 speichert das Zwischenprodukt in dem Akkumulationsregister 110. Die Zustandsmaschine 114 kann die Multiplikationsschaltung 102 anweisen, das untere Wort (beispielsweise die am wenigsten signifikanten n Bit) aus dem Akkumulationsregister 110 in den RAM 112 als Endspaltenergebnis (R₀) für die Spalte 0 zu schreiben, da keine weiteren Werte in der Spalte 0 zur Berechnung anstehen, das heißt für dieses Paar von Reihen oder für beliebige andere. Die Zustandsmaschine 114 kann das Akkumulationsregister 110 derart zurücksetzen, dass dieses das obere Wort in dem Akkumulationsregister 110 ist, wobei das obere Wort ein Trägerwert (C₀) ist, der bei der Berechnung eines Ergebnisses für eine nächste Spalte 254 verwendet wird. Zur Berechnung der Spalte 0 sind nur eine Leseoperation (Lesen von A₀) und eine Schreiboperation (Schreiben vom R₀) in den RAM 112 beziehungsweise aus diesem erforderlich.In one example, the state machine 114 first the cache 104 instructing an operand segment A ₀ from the RAM 112 to read. Subsequently, the state machine 114 the multiplication circuit 102 to calculate an intermediate product of A ₀ .B ₀ . The multiplication circuit 102 stores the intermediate in the accumulation register 110 , The state machine 114 can the multiplication circuit 102 instruct the lower word (for example, the least significant n bits) from the accumulation register 110 in the RAM 112 as the final column result (R ₀ ) for the column 0, since there are no further values in the column 0 for calculation, that is for this pair of rows or for any other. The state machine 114 can the accumulation register 110 reset so that this is the upper word in the accumulation register 110 , where the upper word is a carrier value (C ₀ ) used in calculating a result for a next column 254 is used. For calculating column 0 is only a read operation (reading A ₀ ) and a write operation (writing of R ₀ ) to the RAM 112 or from this required.

Unter Verwendung des Trägerwertes C₀ und eines zusätzlichen Operandensegmentes A₁ des Lesevorganges aus dem RAM 112 kann die Zustandsmaschine 114 die Multiplikationsschaltung 102 verwenden, um ein Ergebnis für die nächste Spalte 254 zu berechnen. Die Zustandsmaschine 114 weist den Cache 104 an, A₁ aus dem RAM 112 zu lesen. Anschließend kann der Multiplizierer 106 ein Zwischenprodukt A₀·B₁ berechnen, und der Akkumulator 108 kann A₀·B₁ mit dem Trägerwert C₀ in dem Akkumulationsregis ter 110 akkumulieren. Als solches erfolgt durch das Akkumulationsregister 110 eine Speicherung der Summe des Trägerwertes C₀ und des Produktes von A₀·B₁. Der Multiplizierer 106 berechnet des Weiteren ein Zwischenprodukt von A₁·B₀. Das Multiplikationsprodukt A₁·B₀ ist in dem Akkumulationsregister 110 akkumuliert. Nach dem Akkumulieren des Multiplikationsproduktes kann der Akkumulator 108 ein unteres Wort aus dem Akkumulationsregister 110 in den Speicher 112 als Endspaltenergebnis (R₁) schreiben und das Akkumulationsregister mit dem oberen Wort in dem Akkumulationsregister 110 als Trägerwert C₁ für eine nächste Spalte 256 zurücksetzen. Zusammengefasst bedeutet dies für den Multiplikationszyklus zur Berechnung von Spalte 1:
Leseoperationen: Lesen von A₁ (A₀ wurde vorher gelesen und in dem Cache);
Schreiboperationen: Schreiben von R₁;
R₁ = unteres Wort von (C₀ + A₀·B₁ + A₁·B₀); und
C₁ = oberes Wort von (C₀ + A₀·B₁ + A₁·B₀).Using the carrier value C ₀ and an additional operand segment A _{1 of} the read from the RAM 112 can the state machine 114 the multiplication circuit 102 use to get a result for the next column 254 to calculate. The state machine 114 assigns the cache 104 on, A ₁ from the RAM 112 to read. Subsequently, the multiplier 106 calculate an intermediate A ₀ .B ₁ , and the accumulator 108 A ₀ .B ₁ may be the carrier value C ₀ in the accumulation register 110 accumulate. As such, through the accumulation register 110 a storage of the sum of the carrier value C ₀ and the product of A ₀ · B ₁ . The multiplier 106 further calculates an intermediate of A ₁ .B ₀ . The multiplication product A ₁ .B ₀ is in the accumulation register 110 accumulated. After accumulating the multiplication product, the accumulator 108 a lower word from the accumulation register 110 in the store 112 as the final column result (R ₁ ) and the accumulation register with the upper word in the accumulation register 110 as carrier value C ₁ for a next column 256 reset to default. In summary, this means for the multiplication cycle for the calculation of column 1:
Read operations: read A ₁ (A ₀ was previously read and in the cache);
Write operations: write R ₁ ;
R ₁ = lower word of (C ₀ + A ₀ .B ₁ + A ₁ .B ₀ ); and
C ₁ = upper word of (C ₀ + A ₀ * B ₁ + A ₁ * B ₀ ).

Die Summe der Gewichtungen der wortbreiten Operandensegmente ist gleich der Gewichtung der entsprechenden Spalte. Dies bedeutet, dass bei dem vorstehenden Beispiel für die Spalte 1 254 mit einer Gewichtung von „1” die Gewichtungen der wortbreiten Operandensegmente gleich 0 + 1 = 1 und 1 + 0 = 1 für jede von den beiden Multiplikationsoperationen ist.The sum of the weights of the word-wide operand segments equals the weight of the corresponding column. This means that in the above example for column 1 254 with a weight of "1", the weights of the word-wide operand segments are 0 + 1 = 1 and 1 + 0 = 1 for each of the two multiplication operations.

Für die Spalte 2 256 kann der Akkumulator 108 die Multiplikationsprodukte A₂·B₀ und A₁·B₁ und den Trägerwert C₁ aus der Spalte 254 akkumulieren. Da die Reihen 202, 204 nicht die Endreihen zur Berechnung für die Spalte 254 sind, schreibt der Akkumulator 108 das obere Wort des Akkumulationsergebnisses als Zwischenspaltenergebnis für die Spalte 2 (Int₂). Zusammengefasst bedeutet dies für den Multiplikationszyklus zur Berechnung von Spalte 2:
Leseoperationen: Lesen von A₂ (A₁ wurde vorher gelesen und in dem Cache);
Schreiboperationen: Schreiben von Int₁;
Int₂ = unteres Wort von (C₁ + A₁·B₁ + A₂·B₀); und
C₂ = oberes Wort von (C₁ + A₁·B₁ + A₂·B₀).For column 2 256 can the accumulator 108 the multiplication products A ₂ .B ₀ and A ₁ .B ₁ and the carrier value C ₁ from the column 254 accumulate. Because the ranks 202 . 204 not the end rows to calculate for the column 254 are, writes the accumulator 108 the upper word of the accumulation result as an intermediate column result for column 2 (Int ₂ ). In summary, this means for the multiplication cycle for calculating column 2:
Read operations: read A ₂ (A ₁ was read before and in the cache);
Write operations: Write Int ₁ ;
Int ₂ = lower word of (C ₁ + A ₁ * B ₁ + A ₂ * B ₀ ); and
C ₂ = upper word of (C ₁ + A ₁ * B ₁ + A ₂ * B ₀ ).

Auf ähnliche Weise schreibt für die Spalte 3 258 der Akkumulator 108 das untere Wort des Akkumulationsergebnisses in den RAM 112 als Zwischenspaltenergebnis Int₃, und es wird für die Spalte 4 260 das untere Wort des Akkumulationsergebnisses in den RAM 112 als Zwischenspaltenergebnis Int₄ geschrieben. An diesem Punkt weist, da keine weiteren A-wortbreiten Operandensegmente zum Lesen aus dem Speicher 112 vorhanden sind, die Zustandsmaschine 114 den Akkumulator 108 an, das obere Wort in dem Akkumulationsregister 110 in den RAM 112 als Zwischenergebnis (Int₅) für die nächste Spalte, das heißt Spalte 5 262, zu schreiben.Similarly, for column 3, write 258 the accumulator 108 the bottom word of the accumulation result in the RAM 112 as the intermediate column result Int ₃ , and it becomes for the column 4 260 the bottom word of the accumulation result in the RAM 112 written as an intermediate column result Int ₄ . At this point, because there are no more A-word-wide operand segments to read from memory 112 are present, the state machine 114 the accumulator 108 on, the upper word in the accumulation register 110 in the RAM 112 as intermediate result (Int ₅ ) for the next column, that is column 5 262 to write.

Die Zustandsmaschine 114 kann die Daten zurücksetzen, die in der Multiplikationsschaltung 102 gespeichert sind. Die Zustandsmaschine 114 kann den Wert zurücksetzen, der in dem Akkumulationsregister 110 gespeichert ist. Bei einigen Beispielen kann die Zustandsmaschine 114 selektiv Werte, die in dem Cache 104 gespeichert sind, löschen (clear).The state machine 114 can reset the data in the multiplication circuit 102 are stored. The state machine 114 can reset the value in the accumulation register 110 is stored. In some examples, the state machine 114 selectively values in the cache 104 are stored, delete (clear).

Als nächstes kann die Zustandsmaschine 114 ein weiteres Paar von Reihen auswählen, das nicht vorher ausgewählt worden ist. Bei diesem Beispiel wählt die Zustandsmaschine 114 die Reihen 206, 208 aus. Die Zustandsmaschine 114 weist den Cache 104 an, B Operandensegmente entsprechend den Reihen 206, 208 (das heißt B₂ und B₃) in den Cache 104 zu lesen. Bei einer Implementierung können B₂ und B₃ die Werte B₀ und B₁ in dem Cache 104 ersetzen, um die Größenanforderungen an den Cache 104 zu verringern.Next, the state machine 114 select another pair of rows that has not been previously selected. In this example, the state machine chooses 114 the rows 206 . 208 out. The state machine 114 assigns the cache 104 on, B operand segments according to the rows 206 . 208 (ie B ₂ and B ₃ ) in the cache 104 to read. In one implementation, B ₂ and B _{3 may have} the values B ₀ and B ₁ in the cache 104 replace the size requirements for the cache 104 to reduce.

Die Zustandsmaschine 114 liest den A₀-Wert aus dem RAM 112 in den Cache 104. Die Summe aus der Gewichtung von A₀ und B₂ ist die Gewichtung einer ersten Spalte zur Berechnung, das heißt Spalte 2 256. Unter Verwendung der vorbestimmten Gewichtung kann die Zustandsmaschine 114 prüfen, ob ein Zwischenergebnis der Gewichtung 2 (Int₂) in dem RAM 112 verfügbar ist. Wie vorstehend beschrieben worden ist, wurde Int₂ vorher berechnet, ist in dem RAM 112 gespeichert und wird in den Cache 104 gelesen. Bei einer Implementierung kann die Zustandsmaschine 114 den Akkumulator 108 anweisen, das Akkumulationsregister 110 auf Int₂ zurückzusetzen.The state machine 114 reads the A ₀ value from the RAM 112 in the cache 104 , The sum of the weighting of A ₀ and B ₂ is the weighting of a first column for calculation, that is column 2 256 , Using the predetermined weighting, the state machine can 114 Check for an intermediate result of weighting 2 (int ₂ ) in the RAM 112 is available. As described above, Int _{2 has} been previously calculated, is in the RAM 112 saved and is in the cache 104 read. In one implementation, the state machine 114 the accumulator 108 instruct the accumulation register 110 to reset to Int ₂ .

Durch Akkumulieren von Int₂ und eines Multiplikationsproduktes von A₀·B₂ erhält die Multiplikationsschaltung 102 ein Endergebnis für die Spalte 2 256 (R₂) und einen Trägerwert (C₂) für die Berechnung der nächsten Spalte 3 258. Dies bedeutet, dass nach dem Akkumulieren die Zustandsmaschine 114 die Multiplikationsschaltung 102 anweisen kann, das untere n-Bit-Wort des Akkumulationsergebnisses als R₂ zu speichern und das Akkumulationsregister 110 auf das obere Wort des Akkumulationsergebnisses als Trägerwert C₂ zurückzusetzen. Das untere Wort ist bei diesem Beispiel ein Endspaltener gebnis, da keine zusätzlichen Reihen in Spalte 2 vorhanden sind, die einer Berechnung bedürften. Zusammengefasst bedeutet dies für den Multiplikationszyklus zur Berechnung von Spalte 2 für Reihen 2 und 3 folgendes:
Leseoperationen: Lesen von Int₂; Lesen von A₀;
Schreiboperationen: Schreiben von R₂;
R₂ = unteres Wort von [(A₀·B₂) + Int₂];
C₂ = oberes Wort von [(A₀·B₂) + Int₂]By accumulating Int ₂ and a multiplication product of A ₀ .B ₂ , the multiplication circuit is obtained 102 an end result for column 2 256 (R ₂ ) and a carrier value (C ₂ ) for the calculation of the next column 3 258 , This means that after accumulating the state machine 114 the multiplication circuit 102 may instruct to store the n-bit lower word of the accumulation result as R ₂ and the accumulation register 110 to reset the upper word of the accumulation result as a carrier value C ₂ . The lower word is a final column result in this example There are no additional rows in column 2 that would require a calculation. In summary, for the multiplication cycle for calculating column 2 for rows 2 and 3, this means:
Read operations: reading from Int ₂ ; Reading A ₀ ;
Write operations: Write R ₂ ;
R ₂ = lower word of [(A ₀ .B ₂ ) + Int ₂ ];
C ₂ = upper word of [(A ₀ .B ₂ ) + Int ₂ ]

Für den Multiplikationszyklus sind pro Spalte also zwei Leseoperationen und eine Schreiboperation erforderlich. Dies schließt das Lesen von B₂ und B₃ nicht mit ein, da diese zu Anfang gelesen und in den Cache überführt werden und während der Berechnungen im Zusammenhang mit den Reihen 2 und 3 Verwendung finden. Die maximale Anzahl von Speicheroperationen, die bei der Berechnung der Multiplikationsmatrix 200 erforderlich ist, ist zwei Lesevorgänge und ein Schreibvorgang pro Spalte, wie dies durch das Beispiel verdeutlicht wird. Als solches kann der RAM 112 entweder als Dualport- oder zwei Einzelportspeicher ausgebildet sein.For the multiplication cycle, two read operations and one write operation are required per column. This does not include the reading of B ₂ and B ₃ , since they are initially read and cached and used during the calculations associated with rows 2 and 3. The maximum number of memory operations used in the calculation of the multiplication matrix 200 is required is two reads and one write per column, as illustrated by the example. As such, the RAM can 112 be configured either as a dual-port or two single-port memory.

Wie in 2 gezeigt ist, kann die Zustandsmaschine 114 auf ähnliche Weise die Multiplikationsschaltung 102 anweisen, R₃ durch Akkumulieren des Trägerwertes C₂, von A₁·B₂, A₀·B₃ und Int₃ zu berechnen. So kann die Zustandsmaschine 114 beispielsweise die Multiplikationsschaltung 102 anweisen, A₁ und Int₃ aus dem RAM 112 zu lesen. Anschließend kann die Zustandsmaschine 114 die Multiplikationsschaltung 102 anweisen, Int₃ in das Akkumulationsregister 110 zu akkumulieren. Nach dem Akkumulieren von Int₃ kann die Zustandsmaschine 114 sequenziell die Multiplikationsschaltung 102 anweisen, die Multiplikationsprodukte A₁·B₂ und A₀·B₃ zu berechnen und zu akkumulieren. Durch Extrahieren des unteren Wortes und des oberen Wortes des Akkumulationsergebnisses kann die Zustandsmaschine 114 den Akkumulator 108 anweisen, das untere Wort des Akkumulationsergebnisses als R₃ zu schreiben und das Akkumulationsregister 110 auf die oberen verbleibenden Bits des Akkumulationsergebnisses als Trägerwert C₃ zurückzusetzen.As in 2 shown is the state machine 114 similarly, the multiplication circuit 102 to calculate R ₃ by accumulating the carrier value C ₂ , A ₁ .B ₂ , A ₀ .B _3, and Int ₃ . So can the state machine 114 for example, the multiplication circuit 102 instruct A ₁ and Int ₃ from the RAM 112 to read. Subsequently, the state machine 114 the multiplication circuit 102 instruct Int ₃ to the accumulation register 110 to accumulate. After accumulating Int ₃ , the state machine can 114 sequentially the multiplication circuit 102 to calculate and accumulate the multiplication products A ₁ .B ₂ and A ₀ .B ₃ . By extracting the lower word and the upper word of the accumulation result, the state machine can 114 the accumulator 108 instruct to write the bottom word of the accumulation result as R ₃ and the accumulation register 110 to reset to the upper remaining bits of the accumulation result as a carrier value C ₃ .

Auf ähnliche Weise kann die Zustandsmaschine 114 die Multiplikationsschaltung 102 zur Berechnung der Endspaltenergebnisse R₄ und R₅ betreiben. So kann die Multiplikationsschaltung 102 beispielsweise R₄ durch Akkumulieren des Trägerwertes C₃ aus der Spalte 3 258, von Int₄, A₂·B₂ und A₁·B₃ berechnen. Die Multiplikationsschaltung 102 kann R₅ durch Akkumulieren des Trägerwertes C₄ aus der Spalte 4 260 mit Int₅, A₃·B₂ und A₂·B₃ berechnen.Similarly, the state machine 114 the multiplication circuit 102 to calculate the final column results R ₄ and R ₅ . So the multiplication circuit 102 For example, R ₄ by accumulating the carrier value C ₃ from column 3 258 , from Int ₄ , calculate A ₂ · B ₂ and A ₁ · B ₃ . The multiplication circuit 102 R ₅ can be calculated by accumulating the carrier value C ₄ from column 4 260 with Int ₅ , calculate A ₃ · B ₂ and A ₂ · B ₃ .

Da kein Zwischenspaltenergebnis für die Spalte 264 verfügbar ist, kann die Zustandsmaschine 114 die Multiplikationsschaltung 102 anweisen, A₃·B₃ zu akkumulieren und den Trägerwert C₅ aus Spalte 5 262 zu akkumulieren. Nach dem Akkumulieren kann die Multiplikationsschaltung 102 das untere Wort des Akkumulationsergebnisses in den RAM 112 als R₆ schreiben. An diesem Punkt werden die wortbreiten Operandensegmentmultiplikationsoperationen für die Reihen 206, 208 beendet, und es sind keine weiteren Reihen zur Berechnung übrig. Daher werden die oberen verbleibenden Bits des Akkumulationsergebnisses in den RAM 112 als Endergebnis für Spalte 7 266, das heißt als R₇, geschrieben.There is no intermediate column result for the column 264 is available, the state machine can 114 the multiplication circuit 102 to accumulate A ₃ · B ₃ and the carrier value C ₅ from column 5 262 to accumulate. After accumulation, the multiplication circuit 102 the bottom word of the accumulation result in the RAM 112 write as R ₆ . At this point, the word-wide operand segment multiplication operations for the rows 206 . 208 finished, and there are no more rows left for the calculation. Therefore, the upper remaining bits of the accumulation result become the RAM 112 as the final result for column 7 266 , that is, written as R ₇ .

Das Multiplikationsergebnis von A und B kann aus dem RAM 112 gelesen werden, in dem Folgendes berechnet wird: R = R0 + R1·232 + R2·264 + R3·296 + R4·2128 + R5·2160 + R6·2192 + R7·2224 The multiplication result of A and B may be from the RAM 112 read by calculating: R = R 0 + R 1 · 2 32 + R 2 · 2 64 + R 3 · 2 96 + R 4 · 2 128 + R 5 · 2 160 + R 6 · 2 192 + R 7 · 2 224

Beispiel zur Durchführung von Berechnungen für eine MultiplikationsmatrixExample for performing calculations for a multiplication matrix

In 3 ist eine Implementierung eines Prozesses 300 gezeigt, der verwendet werden kann, um die Berechnungen für eine Multiplikationsmatrix unter Verwendung des Systems 100 durchzuführen. Zu illustrativen Zwecken wird der Prozess 300 im Zusammenhang mit dem vorstehend erläuterten Beispiel beschrieben, wobei die Beispielsmultiplikationsmatrix 200 diejenige von 2 ist. Der Prozess 300 beginnt mit dem Empfangen eines ersten Operanden mit wortbreiten Operandensegmenten A₀, A₁, A₂, ..., A_k, siehe Schritt 302. Das System 100 kann beispielsweise einen ersten Operanden A mit Operandensegmenten A₀–A₃, das heißt für k = 3, empfangen. Als nächstes beinhaltet der Prozess 300 in Schritt 304 das Empfangen eines zweiten Operanden B mit wortbreiten Operandensegmenten B₀, B₁, B₂, ..., B_m. Das System 100 kann beispielsweise einen zweiten Operanden B mit Operandensegmenten B₀–B₃, das heißt für m = 3, empfangen.In 3 is an implementation of a process 300 which can be used to calculate for a multiplication matrix using the system 100 perform. For illustrative purposes, the process becomes 300 described in connection with the example explained above, wherein the example multiplication matrix 200 the one from 2 is. The process 300 begins by receiving a first operand with word-wide operand segments A ₀ , A ₁ , A ₂ , ..., A _k , see step 302 , The system 100 For example, a first operand A may be received with operand segments A ₀ -A ₃ , that is, for k = 3. Next is the process 300 in step 304 receiving a second operand B with word-wide operand segments B ₀ , B ₁ , B ₂ , ..., B _m . The system 100 For example, a second operand B may be received with operand segments B ₀ -B ₃ , that is, for m = 3.

In Schritt 306 beinhaltet der Prozess 300 das Auswählen eines Paares von Reihen, nämlich von Reihe row_i und Reihe row_i+1, die vorher nicht berechnet worden sind (wobei gilt: i ≤ m – 1). Die Zustandsmaschine 114 kann beispielsweise die Reihen 202, 204 auswählen, um Multiplikationsoperationen unter Verwendung der Operandensegmente durchzu führen. Anschließend beinhaltet der Prozess 300 das Lesen eines Paares von wortbreiten Operanden B_i und B_i+1 aus dem Speicher, siehe Schritt 308. Der Cache 104 kann beispielsweise ein Paar von wortbreiten Operandensegmenten aus dem RAM 112 lesen. Der Prozess 300 beinhaltet in Schritt 310 das Setzen j = 0, das heißt die Zustandsmaschine 114 kann eine Gewichtung j von A_j derart auswählen, dass diese zu Anfang gleich 0 ist.In step 306 includes the process 300 selecting a pair of rows, namely series row _i and row row _{i + 1} , which have not previously been calculated (where: i ≤ m-1). The state machine 114 can, for example, the rows 202 . 204 to perform multiplication operations using the operand segments. Subsequently, the process includes 300 reading a pair of word-wide operands B _i and B _{i + 1} from memory, see step 308 , The cache 104 For example, a pair of word-wide operand segments may be out of RAM 112 read. The process 300 includes in step 310 setting j = 0, that is the state machine 114 can select a weighting j of A _j such that it is equal to 0 at the beginning.

Der Prozess 300 beinhaltet das Lesen von A aus dem Speicher in Schritt 312. So kann der Cache 104 beispielsweise das Operandensegment A₀ aus dem Speicher 112 lesen. In Schritt 314 beinhaltet der Prozess 300 das Bestimmen, ob j = 0 ist. Ist j = 0, so beinhaltet der Prozess 300 in Schritt 316 das Durchführen der Multiplikation g = A_j·B_i. So kann die Multiplikationsschaltung 102 beispielsweise eine Multiplikation A₀·B₂ für die Spalte 2 256 durchführen, wenn die Reihen 206, 208 ausgewählt sind. Ist j ≠ 0, so beinhaltet der Prozess 300 das Durchführen der Multiplikation von g = A_j·B_i + A_j-1·B_i+1 in Schritt 318. Die Multiplikationsschaltung 102 kann beispielsweise eine Multiplikation A₁·B₀ + A₀·B₁ in Spalte 254 zur Berechnung von R₁ durchführen, wenn die Reihen 202, 204 ausgewählt sind.The process 300 involves reading A from memory in step 312 , So can the cache 104 for example, the operand segment A ₀ from the memory 112 read. In step 314 includes the process 300 determining if j = 0. If j = 0, then the process includes 300 in step 316 performing the multiplication g = A _j * B _i . So the multiplication circuit 102 for example, a multiplication A ₀ .B ₂ for the column 2 256 perform when the rows 206 . 208 are selected. If j ≠ 0, then the process includes 300 performing the multiplication of g = A _j * B _i + A _j-1 * B _{i + 1} in step 318 , The multiplication circuit 102 For example, a multiplication A ₁ .B ₀ + A ₀ .B ₁ in column 254 to calculate R ₁ if the rows 202 . 204 are selected.

Nach dem Durchführen der Multiplikation in Schritt 316 oder 318 beinhaltet der Prozess 300 das Bestimmen, ob Int_i+j (Zwischenspaltenergebnis mit der Gewichtung gleich i + j) in Schritt 320 vorhanden ist. Die Zustandsmaschine 114 kann beispielsweise prüfen, ob Int₂ in dem RAM 112 vorhanden ist. Ist Int_i+j vorhanden, so beinhaltet der Prozess 300 das Lesen von Int_i+j aus dem Speicher in Schritt 322. In dem vorbeschriebenen Beispiel kann der Cache 104 Int₂ lesen, wenn die Reihen 206, 208 ausgewählt sind, da die Spalten 256 vorher berechnet worden sind (in den Reihen 202, 204). Sind Int_i+j nicht vorhanden, so beinhaltet der Prozess 300 in Schritt 324 das Setzen von Int_i+j auf 0, während anderenfalls Schritt 322 durchgeführt wird, um Int_i+j aus dem Speicher zu lesen.After performing the multiplication in step 316 or 318 includes the process 300 determining if int _{i + j} (intermediate-split result weighted equal to i + j) in step 320 is available. The state machine 114 For example, check if Int ₂ in the RAM 112 is available. If int _{i + j} exists, the process includes 300 reading Int _{i + j} from memory in step 322 , In the example described above, the cache 104 Int ₂ read when the ranks 206 . 208 are selected because the columns 256 previously calculated (in the rows 202 . 204 ). If int _{i + j are} not present, the process includes 300 in step 324 setting Int _{i + j} to 0, otherwise step 322 is performed to read Int _{i + j} from the memory.

Nach dem Lesen von Int_i+j aus dem Speicher in Schritt 322 oder dem Setzen von Int_i+j auf 0 in Schritt 324 beinhaltet der Prozess 300 in Schritt 326 ein Akkumulieren von g + Int_i+j + C_i+j-1 (wobei C_i+j-1 ein Träger aus der vorherigen Spaltenmultiplikation) ist. Der Akkumulator 108 kann beispielsweise Int₃, A₁·B₂, A₀·B₃ und einen Träger C₂ aus der Multiplikation der Spalte 2 256 zur Berechnung eines Spaltenergebnisses für die Spalte 3 258 akkumulieren. Nach dem Akkumulieren beinhaltet der Prozess 300 in Schritt 328 das Bestimmen mehrerer Reihen zur Berechnung Col_i+j. Die Zustandsmaschine 114 kann beispielsweise bestimmen, ob mehrere Reihen in der Matrix 200 zur Besiedelung für die Col_i+j benötigt werden. Werden keine weiteren Reihen für Col_i+j berechnet, so beinhaltet der Prozess 300 das Schreiben des unteren Wortes des Akkumulationsergebnisses als Endergebnis (R_i+j) für Col_i+j in Schritt 330. Für Spalte 1 254 kann der Akkumulator 108 beispielsweise das untere Wort des Akkumulationsergebnisses von A₁·B₀, A₀·B₁ und den Träger C₁ aus Spalte 0 252 als R₁ schreiben, da keine weiteren Reihen zur Berechnung für die Spalte 1 254 anstehen. Stehen mehr Reihen zur Berechnung für Col_i+j an, so beinhaltet der Prozess 300 das Schreiben des unteren Wortes des Akkumulationsergebnisses als Zwischenergebnis (Int_i+j) in Schritt 332. Bei der Berechnung der Spalten 202, 204 kann der Akkumulator 108 beispielsweise das untere Wort des Akkumulationsergebnisses A₃·B₀, A₂·B₁ und den Träger C₂ aus der Spalte 256 als Int₃ (und nicht als R₃) schreiben, da mehr Reihen für die Spalte 3 258 noch zur Berechnung anstehen.After reading Int _{i + j} from the memory in step 322 or setting Int _{i + j} to 0 in step 324 includes the process 300 in step 326 accumulating g + Int _{i + j} + _{Ci + j-1} (where _{Ci + j-1 is} a carrier from the previous column multiplication). The accumulator 108 For example, Int ₃ , A ₁ .B ₂ , A ₀ .B _3, and a carrier C ₂ from the multiplication of column 2 256 for calculating a column result for column 3 258 accumulate. After accumulating, the process involves 300 in step 328 determining multiple rows for calculation Col _{i + j} . The state machine 114 For example, you can determine if there are multiple rows in the matrix 200 needed for colonization for the Col _{i + j} . If no further rows are calculated for Col _{i + j} , the process includes 300 writing the lower word of the accumulation result as the final result (R _{i + j} ) for Col _{i + j} in step 330 , For column 1 254 can the accumulator 108 For example, the lower word of the accumulation result of A ₁ * B ₀ , A ₀ * B ₁ and the carrier C ₁ from column 0 252 write as R ₁ , as there are no further rows to calculate for column 1 254 queue. If there are more rows for the calculation for Col _{i + j} , then the process includes 300 writing the lower word of the accumulation result as an intermediate result (Int _{i + j} ) in step 332 , When calculating the columns 202 . 204 can the accumulator 108 for example, the lower word of the accumulation result A ₃ · B ₀ , A ₂ · B ₁ and the carrier C ₂ from the column 256 Write as int ₃ (and not as R ₃ ), as there are more rows for column 3 258 still pending for calculation.

Der Prozess 300 beinhaltet das Setzen der oberen verbleibenden Bits des Akkumulationsergebnisses als Träger (C_i-j) in Schritt 334. So kann der Akkumulator 108 beispielsweise das Akkumulationsregister 110 auf das obere Wort eines Wertes mit aktueller Speicherung in dem Akkumulationsergebnis 110 zurücksetzen. In Schritt 336 beinhaltet der Prozess 300 das Bestimmen, ob j > k ist. Ist j ≤ k, so wird der Schritt 312 wiederholt. Ist j > k, so beinhaltet der Prozess 300 in Schritt 340 das Durchführen der Multiplikation g = A_j·B_i+j. So kann beim Durchführen der Multiplikation für Spalte 6 264 (wobei j = k = 3) die Multiplikationsschaltung 102 A₃·B₃ durchführen.The process 300 involves setting the upper remaining bits of the accumulation result as the carrier (C _ij ) in step 334 , So can the accumulator 108 for example, the accumulation register 110 to the upper word of a value with current storage in the accumulation result 110 reset to default. In step 336 includes the process 300 determining if j> k. If j ≤ k, then the step 312 repeated. If j> k, then the process includes 300 in step 340 performing the multiplication g = A _j * B _{i + j} . So, when performing the multiplication for column 6 264 (where j = k = 3) the multiplication circuit 102 Perform A ₃ · B ₃ .

In Schritt 342 beinhaltet der Prozess 300 das Bestimmen, ob mehr Reihen zur Berechnung anstehen. Die Zustandsmaschine 114 kann beispielsweise prüfen, ob eine beliebige Reihe in der Matrix 200 nicht besiedelt ist. Stehen mehr Reihen zur Berechnung an, so beinhaltet der Prozess 300 das Schreiben des unteren Wortes des Akkumulationsergebnisses als Zwischenergebnis (Int_i+j) in Schritt 344. Anschließend beinhaltet der Prozess 300 in Schritt 346 das Schreiben des oberen Wortes des Akkumulationsergebnisses als Zwischenergebnis (Int_i+j+1), und der Prozess kehrt zyklisch zu Schritt 306 zurück und wird fortgesetzt. Werden beispielsweise die Reihen 202, 204 ausgewählt, so kann die Multiplikationsschaltung 102 das untere Wort in dem Akkumulationsregister 110 als Int₄ und das obere Wort in dem Akkumulationsergebnis 110 als Int₅ schreiben, nachdem die Multiplikationsoperationen für die Spalte 4 260 durchgeführt sind.In step 342 includes the process 300 determining if there are more rows to calculate. The state machine 114 For example, consider whether any row in the matrix 200 is not populated. If more rows are available for calculation, the process is included 300 writing the lower word of the accumulation result as an intermediate result (Int _{i + j} ) in step 344 , Subsequently, the process includes 300 in step 346 the writing of the upper word of the accumulation result as an intermediate result (Int _{i + j + 1} ), and the process cyclically returns to step 306 back and will continue. For example, the rows 202 . 204 selected, then the multiplication circuit 102 the lower word in the accumulation register 110 as Int ₄ and the upper word in the accumulation result 110 write as int ₅ after the multiplication operations for column 4 260 are performed.

In Schritt 342 beinhaltet dann, wenn keine weiteren Reihen zur Berechnung anstehen, der Prozess 300 in Schritt 348 das Schreiben des unteren Wortes des Akkumulationser gebnisses als Endergebnis (R_i+j). Sodann beinhaltet der Prozess 300 in Schritt 350 das Schreiben des oberen Wortes des Akkumulationsergebnisses als Endergebnis R_i+j+1, woraufhin der Prozess 300 endet. Bei der Berechnung der Spalte 264 nach Abarbeitung der Reihen 202 204 kann der Akkumulator 108 beispielsweise das untere Wort in das Akkumulationsergebnis 110 als R₆ und das obere Wort in dem Akkumulationsergebnis 110 als R₇ schreiben. Das Multiplikationsergebnis von A und B kann anschließend aus dem RAM 112 gesesen werden, indem berechnet wird: R = R0 + R1·232 + R2·264 + R3·296 + R4·2128 + R5·2160 + R6·2192 + R7·2224 In step 342 includes, if there are no more rows to calculate, the process 300 in step 348 the writing of the lower word of the accumulation result as the final result (R _{i + j} ). Then the process includes 300 in step 350 writing the upper word of the accumulation result as the final result R _{i + j + 1} , whereupon the process 300 ends. When calculating the column 264 after processing the rows 202 204 can the accumulator 108 for example, the lower word in the accumulation result 110 as R ₆ and the upper word in the accumulation result 110 write as R ₇ . The multiplication result of A and B can then be taken from the RAM 112 be calculated by calculating: R = R 0 + R 1 · 2 32 + R 2 · 2 64 + R 3 · 2 96 + R 4 · 2 128 + R 5 · 2 160 + R 6 · 2 192 + R 7 · 2 224

Obwohl einige Implementierungen des Multiplikationssystems sowie des Prozesses beschrieben worden sind, können auch andere Implementierungen zum Einsatz kommen. Bei verschiedenen Implementierungen kann die Zustandsmaschine 114 ein Paar von Reihen zufällig aus den m + 1 Reihen auswählen. So kann die Zustandsmaschine 114 einen Zufallsgenerator beinhalten. Unter Verwendung des Zufallsgenerators kann die Zustandsmaschine 114 eine Zufallszahl zur Darstellung eines Paares von Reihen in der Matrix 200 darstellen, die nicht vorher ausgewählt worden sind. Die Zustandsmaschine 114 kann anschließend zur Verwendung der zufällig ausgewählten Reihen zur Durchführung von Multiplikationsoperationen weitergehen. Bei einigen Beispielen kann die zufällige Auswahl die Sicherheit des Systems 100 verbessern. Ein Paar von Reihen muss nicht unbedingt ein Paar von benachbarten Reihen sein. Es können zwei beliebige Reihen ein Paar sein.Although some implementations of the multiplication system as well as the process are described other implementations can be used. In various implementations, the state machine 114 randomly select a pair of rows from the m + 1 rows. So can the state machine 114 include a random number generator. Using the random generator, the state machine can 114 a random number representing a pair of rows in the matrix 200 represent that have not been previously selected. The state machine 114 may then continue to use the randomly selected rows to perform multiplication operations. In some examples, the random selection may be the security of the system 100 improve. A pair of rows does not necessarily have to be a pair of adjacent rows. Any two rows can be a pair.

Bei einigen Implementierungen kann die Zustandsmaschine 114 zur Durchführung von Multiplikationsoperationen für mehr als jeweils zwei Reihen ausgestaltet sein. Bei einer derartigen Implementierung kann ein größerer Akkumulator erforderlich sein. Bei einem Beispiel kann das System 100 jeweils eine Gruppe von drei oder mehr Reihen zur Durchführung von Multiplikationsoperationen auswählen. Bei einigen Implementierungen werden drei oder mehr Reihen entsprechend einer nummerischen Sequenz von zunehmenden oder abnehmenden Gewichtungswerten der Gewichtung für die Reihen, die in jeder Gruppe beinhaltet sind, ausgewählt. Bei einigen Beispielen kann das Auswählen mehrerer Reihen zur Multiplikation die Speicherzugriffe zum Multiplizieren von zwei großen Zahlen weiter verringern.In some implementations, the state machine may 114 to perform multiplication operations for more than two rows each. Such an implementation may require a larger accumulator. In one example, the system may 100 each select a group of three or more rows to perform multiplication operations. In some implementations, three or more rows are selected according to a numerical sequence of increasing or decreasing weighting values of the weighting for the rows included in each group. In some examples, selecting multiple rows for multiplication may further reduce memory accesses to multiply two large numbers.

Die Schritte bei dem Prozess 300 und in dem im Zusammenhang mit 2 beschriebenen Beispiel können in einer anderen als der vorbeschriebenen Reihenfolge durchge führt werden. Die Reihenfolge der Schritte, die bei den illustrativen Beispielen beschrieben worden ist, ist demonstrativ; die gewünschten Ergebnisse können auch durch Ausführen einiger oder aller Schritte in einer anderen Reihenfolge erreicht werden.The steps in the process 300 and in that related to 2 Example described can be Runaway leads in a different order than the above. The order of the steps described in the illustrative examples is demonstrative; the desired results can also be achieved by performing some or all of the steps in a different order.

Bei einigen Implementierungen können die vorstehend beschriebenen Techniken auch zur Durchführung einer Reihe von mathematischen Operationen verwendet werden, so beispielsweise für A·B + Z oder Z – A·B oder A·B – Z, obwohl auch andere Operationen möglich sind. Als illustratives Beispiel zeigt 4 eine Multiplikationsmatrix 400, die zur Berechnung von A·B + Z ausgeführt werden kann, wobei:
A = 0x11111111222222223333333344444444; und
B = 0x55555555666666667777777788888888; und
Z = 0x999999999101010101212121214141414161616161818181820202020224242424
(mit x0 zur Bezeichnung einer hexadezimalen Zahl)In some implementations, the techniques described above may also be used to perform a variety of mathematical operations, such as A x B + Z or Z - A x B or A x B - Z, although other operations are possible. As an illustrative example shows 4 a multiplication matrix 400 which can be performed to calculate A * B + Z, where:
A = 0x11111111222222223333333344444444; and
B = 0x55555555666666667777777788888888; and
Z = 0x99999999910101010121212141414141616161818181820202020224242424
(with x0 to denote a hexadecimal number)

Die Berechnung kann folgendermaßen in Worte in Maschinengröße, in diesem Beispiel in 32-Bit-wortbreite Operandensegmente, aufgeteilt werden: A = A0 + A1·232 + A2·264 + A3·296; und B = B0 + B1·232 + B2·264 + B3·296; und Z = Z0 + Z1·232 + Z2·264 + Z3·296 + Z4·2128 + Z5·2160 + Z6·2192 + Z7·2224 wobei:
A₀ = 0x44444444; A₁ = 0x33333333; A₂ = 0x22222222; A₃ = 0x11111111;
B₀ = 0x88888888; B₁ = 0x77777777; B₂ = 0x66666666; B₃ = 0x55555555; und
Z₀ = 0x24242424; Z₁ = 0x20202020; Z₂ = 0x18181818; Z₃ = 0x16161616;
Z₄ = 0x14141414; Z₅ = 0x12121212; Z₆ = 0x10101010; Z₇ = 0x99999999;The calculation can be broken up into machine-sized words, in this example 32-bit word-wide operand segments, as follows: A = A 0 + A 1 · 2 32 + A 2 · 2 64 + A 3 · 2 96 ; and B = B 0 + B 1 · 2 32 + B 2 · 2 64 + B 3 · 2 96 ; and Z = Z 0 + Z 1 · 2 32 + Z 2 · 2 64 + Z 3 · 2 96 + Z 4 · 2 128 + Z 5 · 2 160 + Z 6 · 2 192 + Z 7 · 2 224 in which:
A ₀ = 0x44444444; A ₁ = 0x33333333; A ₂ = 0x22222222; A ₃ = 0x11111111;
B ₀ = 0x88888888; B ₁ = 0x77777777; B ₂ = 0x66666666; B ₃ = 0x55555555; and
Z ₀ = 0x24242424; Z ₁ = 0x20202020; Z ₂ = 0x18181818; Z ₃ = 0x16161616;
Z ₄ = 0x14141414; Z ₅ = 0x12121212; Z ₆ = 0x10101010; Z ₇ = 0x99999999;

Wie in Matrix 400 zu sehen ist, treten die Matrixoperationen von A und B im Wesentlichen als dieselben wie die vorstehend im Zusammenhang mit 2 beschriebenen Operationen auf. Gleichwohl werden bei dieser Implementierung bei der Berechnung der Reihen 0 (402) und 1 (404) die Z-wortbreiten Operandensegmente zu den Multiplikationsprodukten auf ähnliche Weise addiert, wie die Zwischenwerte, beispielsweise Int₂, Int₃ etc. zu den Multiplikationsprodukten summiert werden, wenn die Reihen 2 (406) und 3 (408) berechnet werden. Dies bedeutet, dass beispielsweise bei der Berechnung von Col 0 (410) für Reihen 0 (402) und 1 (404) die Operationen folgendermaßen sind, nachdem eine anfängliche Leseoperation die Werte von B₀ und B₁ liest.
Leseoperationen: Lesen von Z₀ und A₀
Schreiboperationen: Schreiben von R₀
wobei R₀ = unteres Wort von (A₀·B₀ + Z₀); und
Träger C₀ = oberes Wort von (A₀·B₀ + Z₀).As in matrix 400 As can be seen, the matrix operations of A and B are essentially the same as those associated with above 2 described operations. However, in this implementation, when calculating the rows 0 ( 402 ) and 1 ( 404 ) adds the Z-word-wide operand segments to the multiplication products in a similar manner as the intermediate values, for example Int ₂ , Int ₃ etc., are added to the multiplication products when the series 2 ( 406 ) and 3 ( 408 ) be calculated. This means that, for example, when calculating Col 0 ( 410 ) for rows 0 ( 402 ) and 1 ( 404 ) the operations are as follows after an initial read reads the values of B ₀ and B ₁ .
Read operations: reading Z ₀ and A ₀
Write operations: Write R ₀
where R ₀ = lower word of (A ₀ .B ₀ + Z ₀ ); and
Carrier C ₀ = upper word of (A ₀ · B ₀ + Z ₀ ).

Die Operationen zur Berechnung von Col 1 (412) für Reihen 0 (402) und 1 (404) lauten folgendermaßen:
Leseoperationen: Lesen von Z₁ und A₁
Schreiboperationen: Schreiben von R₁
wobei R₀ = unteres Wort von (A₁·B₀ + A₀·B₁ + C₀ + Z₁); und
Träger C₀ = oberes Wort von (A₁·B₀ + A₀·B₁ + C₀ + Z₁).The operations for calculating Col 1 ( 412 ) for rows 0 ( 402 ) and 1 ( 404 ) are as follows:
Read operations: reading Z ₁ and A ₁
Write operations: Write R ₁
wherein R ₀ = lower word of (A ₁ * B ₀ + A ₀ * B ₁ + C ₀ + Z ₁ ); and
Carrier C ₀ = upper word of (A ₁ * B ₀ + A ₀ * B ₁ + C ₀ + Z ₁ ).

Die Balance der Matrix kann auf ähnliche Weise wie vorstehend beschrieben berechnet werden. Entsprechend können die Berechnungen ohne mehr als zwei Leseoperationen pro Spalte für ein Paar von Reihen ausgeführt werden.The Balance the matrix in a similar way as described above. Accordingly, the Calculations with no more than two read operations per column for a pair executed by rows become.

Die Erfindung und sämtliche in dieser Druckschrift beschriebenen funktionellen Operationen können in einer digitalen elektronischen Schaltung oder einer Computerhardware, Firmware, Software oder in Kombinationen hieraus implementiert sein. Vorrichtungen der Erfindung können in einem Computerprogrammerzeugnis implementiert sein, das physisch in einer maschinenlesbaren Speichervorrichtung zur Ausführung durch einen programmierbaren Prozessor verkörpert ist. Es können Verfahrensschritte der Erfindung durch einen programmierbaren Prozessor durchgeführt werden, der ein Programm von Anweisungen zur Durchführung von Funktionen der Erfindung durch Verarbeiten von Eingabedaten und Erzeugen einer Ausgabe ausführt.The invention and all functional operations described in this document can be used in a digital electronic circuit or computer hardware, firmware, software, or combinations thereof. Devices of the invention may be implemented in a computer program product physically embodied in a machine-readable storage device for execution by a programmable processor. Method steps of the invention may be performed by a programmable processor that executes a program of instructions to perform functions of the invention by processing input data and generating an output.

Die Erfindung kann vorteilhafterweise in einem oder mehreren Computerprogrammen implementiert sein, die auf einem programmierbaren System ausführbar sind, darunter wenigstens ein programmierbarer Prozessor, der zum Empfangen von Daten und Anweisungen von und zum Übertragen von Daten und Anweisungen an ein Speichersystem gekoppelt ist, wenigstens eine Eingabevorrichtung und wenigstens eine Ausgabevorrichtung. Jedes Computerprogramm kann in einer hochabstrakten Prozedur- oder objektorientierten Programmiersprache implementiert sein, oder auch in Assembler oder Maschinensprache, so dies gewünscht ist; in jedem Fall kann die Sprache eine kompilierende oder interpretierende Sprache sein.The Invention may advantageously be implemented in one or more computer programs be implemented on a programmable system, including at least one programmable processor for receiving of data and instructions from and for transmitting data and instructions is coupled to a memory system, at least one input device and at least one dispenser. Any computer program can in a highly abstract procedure or object-oriented programming language be implemented, or in assembler or machine language, so desired is; In any case, the language can be a compiling or interpreting one Be a language.

Zu den geeigneten Prozessoren zählen beispielsweise sowohl Allgemein- als auch Spezialzweckmikroprozessoren. Im Allgemeinen empfangen Prozessoren Anweisungen und Daten aus einem Nurlesespeicher und/oder einem Speicher mit wahlfreiem Zugriff. Im Allgemeinen beinhaltet ein Computer ein oder mehrere Massenspeichervorrichtungen zur Speicherung von Datendateien. Solche Vorrichtungen beinhalten magnetische Platten, so beispielsweise interne Festplatten und herausnehmbare Platten; magnetooptische Platten; und optische Platten. Speichervorrichtungen, die zur physischen Verkörperung der Computerprogrammanweisungen und Daten geeignet sind, beinhalten alle Formen von nichtflüchtigem Speicher, darunter beispielsweise Halbleiterspeichervorrichtungen wie EPROM, EEPROM und Flash-Speichervorrichtungen; magnetische Platten, so beispielsweise interne Festplatten und herausnehmbare Platten; magnetooptische Platten und CD-ROM-Platten. Ein beliebiges von dem Vorgenannten kann durch eine ASIC (anwendungsspezifische integrierte Schaltung) ergänzt oder darin integriert sein.To counting the appropriate processors for example, both general purpose and special purpose microprocessors. In general, processors receive instructions and data from a read-only memory and / or a random access memory. Generally includes a computer has one or more mass storage devices for storage of data files. Such devices include magnetic plates, such as internal hard disks and removable disks; magneto-optical disks; and optical disks. Storage devices, that to the physical embodiment the computer program instructions and data are suitable include all forms of non-volatile Memory, including, for example, semiconductor memory devices such as EPROM, EEPROM and flash memory devices; magnetic plates, such as internal hard disks and removable disks; magneto-optical disks and CD-ROM disks. Any one of that The above can be achieved through an ASIC (application-specific integrated Circuit) added or be integrated in it.

Zur Bereitstellung einer Interaktion mit einem Anwender kann die Erfindung auf einem Computersystem mit einer Anzeigevorrichtung implementiert sein, so beispielsweise einem Monitor oder einem LCD-Schirm zum Anzeigen von Information für den Anwender und einer Tastatur und einer Zeigevorrichtung, so beispielsweise einer Maus oder einem Trackball, durch die der Anwender eine Eingabe für das Computersystem bereitstellen kann. Das Computersystem kann derart programmiert werden, dass eine grafische Anwenderschnittstelle bereitsteht, durch die das Computerprogramm mit Anwendern interagiert.to Providing an interaction with a user may be the invention be implemented on a computer system with a display device, such as a monitor or LCD screen for display of information for the user and a keyboard and a pointing device, such as a mouse or trackball through which the user inputs for the Computer system can provide. The computer system can do this programmed to provide a graphical user interface the computer program interacts with users.

Es ist eine Mehrzahl von Ausführungsbeispielen der Erfindung beschrieben worden. Es sollte einsichtig sein, dass verschiedene Abwandlungen daran vorgenommen werden können, ohne vom Wesen und Umfang der Erfindung abzuweichen. Entsprechend befinden sich auch andere Ausführungsbeispiele innerhalb des Wesens der nachfolgenden Ansprüche.It is a plurality of embodiments the invention has been described. It should be obvious that Various modifications can be made to it without to depart from the spirit and scope of the invention. Correspondingly also other embodiments within the nature of the following claims.

ZusammenfassungSummary

Verfahren und System zur Multiplikation großer ZahlenMethod and system for multiplication greater numbers

Bereitgestellt werden Verfahren, Vorrichtungen und Systeme zur Multiplikation großer Zahlen. Eine Multiplikationsschaltung ist zur Berechnung des Produktes aus zwei Operanden (A und B) vorgesehen, von denen wenigstens einer breiter als eine Breite ist, die der Multiplikationsschaltung zugeordnet ist. Jeder der Operanden beinhaltet aneinander angrenzende, geordnete wortbreite Operandensegmente (A_j und B_i), die durch spezifische Gewichtungen j (ganze Zahl von 0 bis k) und i (ganze Zahl von 0 bis m) charakterisiert sind. Die Multiplikationsschaltung führt eine Matrix von wortbreiten Operandensegmentpaarmultiplikationsoperationen aus. Multiplikationsoperationen werden jeweils an einem Paar von Reihen ausgeführt. Für jedes Paar von Reihen wird ein Paar von entsprechenden B_i-wortbreiten Operandensegmenten aus einem Speicher gelesen, und es werden wortbreite Operandensegmentpaarmultiplikationsoperationen (A_j·B_i) iterativ für jede von k + 2 Spalten durchgeführt. Für jede Spalte ist ein Maximum von zwei zusätzlichen Speicherleseoperationen und einer Speicherschreiboperation erforderlich.Provided are methods, apparatus and systems for multiplying large numbers. A multiplication circuit is provided for calculating the product of two operands (A and B), at least one of which is wider than a width associated with the multiplication circuit. Each of the operands includes contiguous, ordered word-wide operand segments (A _j and B _i ) characterized by specific weights j (integer from 0 to k) and i (integer from 0 to m). The multiplication circuit executes a matrix of word-wide operand segment pair multiplication operations. Multiplication operations are performed on a pair of rows, respectively. For each pair of rows, a pair of corresponding B _i word wide operand segments are read from a memory, and word wide operand segment pair multiplication operations (A _j * B _i ) are iteratively performed for each of k + 2 columns. Each column requires a maximum of two additional memory read operations and one memory write operation.

Claims

A computer-implemented method of operating a multiplier circuit for computing the product of two operands (A and B), at least one width of which is wider than one of the multiplication circuits, each of the operands including one or more contiguous ordered word-wide operand segments (A _j and B _i ) characterized by specific weights j and i, where j is an integer from 0 to k and i is an integer from 0 to m and one word is a specified number of bits (n), and wherein Multiplication circuit a matrix of word-wide operand segment pair where the matrix comprises m + 1 rows and k + m + 2 columns, each row having a weight x and each column having a weight y, and the multiplier circuit having access to a memory, the method comprising: performing multiplication operations respectively at a pair of rows, wherein for each pair of rows, a pair of corresponding B _i word wide operand segments are read from memory and word wide operand segment pair multiplication operations (A _j * B _i ) are iteratively performed for each of the k + 2 columns such that each column in the matrix requires a maximum of two additional memory read operations and one memory write operation.

The method of claim 1, wherein the multiplication operations performed on pairs of rows be increasing according to a numerical sequence of or decreasing weighting values of the weights for the series, the are included in each pair.

The method of claim 1, wherein the multiplication operations performed on pairs of rows be that random selected from the m + 1 rows become.

The method of claim 1, wherein a word-wide operand segment pair multiplication operation for a column having a weight y ₁ comprises performing at least one multiplication operation of an A _j word-wide operand segment with a B _i word-wide operand segment, the sum of the weight j of the A _j word-wide operand and the weight i of the B _i word-wide operand segment is equal to the weight y _{1 of} the column.

The method of claim 1, further comprising: To calculate an n-bit final column result for each of the k + m + 2 columns, where a final column result for a column is a least significant n-bit word of accumulation includes the word-wide operand segment pair multiplication operations, the for each of the m + 1 rows are carried out with respect to the column.

The method of claim 5, wherein calculating a n-bit Endspaltenergebnisses for a column one for a final pair of rows from the column accumulation calculation Results of the word-wide operand segment pair multiplication operations, a Inter-column result (if any) for the column and a carrier value (if present) equal to an upper word of an accumulation result from the provision for a previously calculated column comprises an accumulation result for the Column, with the n-bit end column result the least significant n bits of the accumulation result for this column includes.

The method of claim 5, wherein an intermediate column result for one Column on for a pair of rows not equal to the final pair of rows for calculation for the Column accumulating results of the word-wide operand segment pair multiplication operations, a previously calculated intermediate column result (if any) for the column and a carrier value (so available) for includes a previously calculated column to an intermediate accumulation result for the Column, with the least significant difference between the columns significant n bit of the intermediate accumulation result for the column includes.

The method of claim 7, wherein an upper word of the intermediate accumulation result for the column is a carrier value for the Column for use in an accumulation operation for a next column includes.

The method of claim 1, wherein the pair of B _i word-wide operand segments of the read from the memory corresponding to the pair of rows is stored in a cache while performing the multiplication operations for the pair of rows.

A computer-implemented method of operating a multiplier circuit for computing the product of two operands (A and B), at least one of which is wider than the multiplication circuit, each of the operands including one or more contiguous, ordered, word-wide operand segments (A _j and B _i ) which are characterized by specific weights j and i, where j is an integer from 0 to k and i is an integer from 0 to m and a word is a specific number of bits (n), the multiplication circuit comprising a matrix of word-wide operand segment pair multiplication operations, where the matrix comprises m + 1 rows and k + m + 2 columns, each row having a weight x and each column having a weight y, the multiplier circuit having access to a memory, the method comprising: for each Pair of weights x ₁ and x ₂ rows in the matrix, reading a pair it from B operand segments from the memory, the weighting of a first B-operand segment (B _x1 ) equal to x ₁ and the weighting of a second B-operand segment (B _x2 ) equal to x2, and further performing the following the: (a) reading an A _j operand segment from the memory (A _j1 ); (b) for a column having a weight y, reading an intermediate column result (Int _y ), if any, of an intermediate column result equal to a predetermined result for the column corresponding to a different pair or pairs of rows; (c) multiplying the A _{j1 operand} segment by the first B operand segment (B _x1 ), wherein the weighting (j ₁ ) of the A _j operand segment together with the weight (x ₁ ) of the first B operand segment equals the weight of the column y is; (d) in the event that j> 0, multiplying a previously read A _j operand segment from storage in a cache (A _j0 ) with the second B operand _segment (B _x2 ), the weighting (j ₀ ) the previously read A _{j0 operand} segment together with the weight (x ₂ ) of the second B operand segment is equal to the weight y of the column; (e) accumulating the results of the one or two multiplication operations and the inter _- column result Int _y (if present) and a carrier value C _y-1 (if present) for a previously calculated column having a weight y-1 to provide an accumulation result; (f) in the event that there are no more rows to compute for the column, writing the least significant n bits of the accumulation result to the memory as the final column result R _y and otherwise writing the least significant n bits of the accumulation result into the memory Memory as an intermediate column result Int _y ; and (g) incrementing j by 1 and riding the steps (a) through (g) until j>k; wherein steps (a) to (g) need not be performed in the order shown.

The method of claim 10, wherein the multiplication operations performed on pairs of rows be increasing according to a numerical sequence of or decreasing weighting values of the weights for the series, which are included in each pair.

The method of claim 10, wherein the multiplication operations performed on pairs of rows which are randomly selected from the m + 1 rows.

A computer program product physically stored on a computer readable medium for operating a multiplier circuit for computing the product of two operands (A and B), at least one of which is wider than the multiplication circuit, each of the operands including one or more contiguous, ordered, word-wide operand segments (A _j and B _i ) characterized by specific weights j and i, where j is an integer from 0 to k and i is an integer from 0 to m and one word is a specified number of bits (n ), wherein the multiplication circuit executes a matrix of word-wide operand segment pair multiplication operations, the matrix executing m + 1 rows and k + m + 2 columns, each row having a weight x and each column having a weight y, the multiplying circuit accessing a memory has instructions that are executable to a programmable percent essor cause for: performing multiplication operations respectively on a pair of rows, wherein for each pair of rows of a pair of corresponding B _i -wortbreiten operand segments of the memory is read and word-wide operand segment pair multiplication operations (A _j · B _i) iteratively for each of k + 2 columns so that each column in the array requires a maximum of two additional memory read operations and one memory write operation.

A computer program product according to claim 13, wherein the multiplication operations are performed on pairs of rows, which correspond to a numerical sequence of increasing or decreasing weighting values of the weights for the rows in each Couple are included, selected become.

A computer program product according to claim 13, wherein the multiplication operations are performed on pairs of rows, selected from the m + 1 rows become.

The computer program product of claim 13, wherein a word-wide operand segment pair multiplication operation for a column having a weight y ₁ comprises performing at least one multiplication operation of an A _j word-wide operand segment with a B _i word-wide operand segment, the sum of the weight j of the A _j word-wide operand and the weight i of the B _i word-wide operand segment is equal to the weight y _{1 of} the column.

The computer program product of claim 13, further comprising instructions executable to cause a programmable processor to: calculate an n-bit final column result for each of the k + m + 2 columns, wherein an end column result for a column is least significant n-bit word of an accumulation of the word-wide operand segment-pair multiplication operations performed for each of the m + 1 rows with respect to the column.

A computer program product according to claim 17, wherein Instructions that are executable are to calculate an n-bit final column result for a column, Include instructions that are executable to cause a programmable processor to: for a terminal pair of rows from the calculation for the column successively accumulating results of the word-wide operand segment pair multiplication operations, a Inter-column result (if any) for the column and a carrier value (if present) equal to an upper word of an accumulation result from the provision for a previously calculated column to get an accumulation result for the column and the n-bit end column result is the least significant n bits of the accumulation result for the column includes.

A computer program product according to claim 18, of Further comprising instructions executable to a programmable processor to induce to: Calculating an n-bit crosstalk result for a column, where an intermediate column result for a column is one for a pair rows not equal to the terminal pair of rows for calculation for the column successively accumulating results of the word-wide operand segment-pair multiplication operations, a previously calculated intermediate column result (if available) for the column and a carrier value (so available) for one previously calculated column includes an intermediate accumulation result for the Column, with the least significant difference between the columns significant n bit of the intermediate accumulation result for the column includes.

The computer program product of claim 19, wherein an upper word of the intermediate accumulation result for the column a carrier value for the Column for use in an accumulation operation for a next column includes.

The computer program product of claim 13, wherein the pair of B _i -wortbreiten operand segments of the read operation from the memory corresponding to the pair of rows is stored in a cache, while the multiplication operations are performed on the pair of rows.

A system for computing the product of two operands (A and B), at least one of which is wider than a multiplier circuit included in the system, each of the operands comprising one or more contiguous, ordered, word-wide operand segments (A _j and B _i ) , which are characterized by specific weights j and i, where j is an integer from 0 to k and i is an integer from 0 to m and a word is a specified number of bits (n), the system being a matrix of word-wide Operand segment pair multiplication operations, where the matrix comprises m + 1 rows and k + m + 2 columns, each row having a weight x and each column having a weight y, the system comprising: a multiplier including one or more inputs for receiving word-wide operand segments for multiplication to form a 2-word-wide intermediate, and for performing a multiplication operation of a Paa res of word-wide operand segments; an accumulator including: one or more inputs for receiving the 2-word intermediate result from the multiplier, an intermediate-column result from a cache, and a carrier value from the cache; one or more outputs for providing the least significant n bits of an accumulation of the 2-word-wide intermediate, inter-column result, and carrier value for the cache as an intermediate or final column result, and further providing the upper bits of the accumulation for the cache as the carrier value; wherein the accumulator is for performing an accumulation operation of the inputs received therein; a cache for receiving and transmitting word-wide operand segments from and to a memory and in communication with the multiplier and the accumulator for providing inputs thereto and receiving outputs therefrom; and an operation sequencer for controlling the cache access of the memory and for controlling the sequence of multiply and accumulate operations performed by the multiplier; the sequences for performing multiplication operations are each defined on a pair of rows, wherein for each pair of rows, a pair of corresponding B _i word-wide operand segments are read from the memory into the cache and word-wide operand segment pair multiplication operations (A _j * B _i ) iteratively are performed by the multiplier for each of k + 2 columns, so that for each column in the array, a maximum of two additional memory read operations and one memory write operation to and from the memory is required.

The system of claim 22, further comprising: the memory, wherein the memory is two memories interfaces required.

The system of claim 22, further comprising: the Memory, where the memory is a single dual port RAM (memory with random access).

The system of claim 22, further comprising: the Memory, wherein the memory comprises two individual port RAMs.

A computer-implemented method for computing the product of two operands (A and B), at least one of which is wider than a width associated with a multiplication circuit performing the multiplication, the method comprising: constructing a matrix of word-wide operand segment pair multiplication operations, the matrix m + 1 rows and k + m + 2 columns, each row has a weight x and each column has a weight y, each of the operands comprising one or more contiguous, ordered, word-wide operand segments (A _j and B _i ) passing through specific weights j and i are characterized, where j is an integer from 0 to k and i is an integer from 0 to m and a word is a specified number of bits (n); and performing multiplication operations on each of a pair of rows, wherein for each pair of rows, a pair of corresponding B _i word wide operand segments are read from a memory and word wide operand segment pair multiplication operations (A _j * B _i ) iteratively performed for each of k + 2 columns so that each column in the array requires a maximum of two additional memory read operations and one memory write operation.

The method of claim 26, wherein the multiplication operations performed on pairs of rows be increasing according to a numerical sequence of or decreasing weighting values of the weights for the series, which are included in each pair.

The method of claim 26, wherein the multiplication operations performed on pairs of rows which are selected from the m + 1 rows.

A system for operating a multiplier circuit for computing the product of two operands (A and B), at least one of which is wider than the multiplication circuit, each of the operands having one or more contiguous, ordered, word-wide operand segments (A _j and B _i ) which are characterized by specific weights j and i, where j is an integer from 0 to k and i is an integer from 0 to m and a word is a specified number of bits (n), where the multiplication circuit is a matrix of word-wide operand segment pair multiplication operations, where the matrix comprises m + 1 rows and k + m + 2 columns, each row having a weight x and each column having a weight y and the multiplier circuit having access to a memory comprising: means for performing multiplication operations each on a pair of rows, with a pair of corresponding ones for each pair of rows n B _i -wortbreiten operand segments is read from the memory and word-wide operand segment pair multiplication operations (A _j · B _i) are iteratively performed for each of k + 2 columns, so that a maximum of two additional memory read operations, and a memory write operation is required for each column in the matrix ,