Multiple Write-Port Memory
The present invention relates to electronic digital storage devices. In particular, it relates to configurations of elemental logic devices that provide a ' digital storage, wherein the data stored within the store may be accessed and changed from two or more independent ports.
Digital storage devices are commonly employed in electronic systems for storing data. This data may be for any purpose, and may comprise executable instructions, or may comprise information required or generated by the execution of instructions, as typically takes place inside a microprocessor. The digital storage devices, also known as memory devices, can be built up by combining elemental logic devices such as AND, OR and NOT gates to produce circuit elements capable of retaining data written to them and reproducing the stored data on demand. Such storage devices typically have a plurality of locations, known as addresses, at which data may be stored, and the data at each location will comprise one or more binary digits of information. The locations are chosen using an address bus, and the data at a location accessed by a data bus. The most common form of this storage is accessible from a single port, so that just one location may be addressed, either by writing to it, or reading from it, at a time. This memory has a single address bus and a single data bus.
Some applications require access to more than one location within the memory at a given instant. Depending on the application, the second and subsequent simultaneous accesses may be for reading only, or for writing or reading data. To cater for this, further address and data bus ports will be incorporated into the memory. Field Programmable Gate Arrays (FPGA) produced by Xilinx® Inc incorporate as a logic primitive a 16 bit memory having two read ports and a single write port. Also incorporated are much larger memories, typically of several kilobits depending on the particular device , having two read and write ports, but such devices are relatively slow
and inconvenient if the user wishes to have just a few bits of information stored in multiple write-port memory.
Multiple write port memory, particularly when implemented using primitive logic functions is typically much larger, in terms of the number of elemental logic functions required for its implementation, than single port memory: its implementation using discrete logic devices or a programmable logic device such as an FPGA is inconvenient due to the number of available gates taken up in adding additional ports. This will clearly reduce the number of gates available for other purposes.
One common application of a multiple write-port memory is as a scoreboard register. A scoreboard register acts as an access controller for one or more logical resources. When the resource is in use a corresponding bit is set in the register indicating this fact. When the resource next becomes available the corresponding bit is cleared. Before use the resource is checked to see that its corresponding scoreboard register flag is cleared, indicating that it is available. The flag is then toggled to the set state and the resource is used. Any other attempt to use the resource in the meantime would find the scoreboard register flag set, and would have to wait until the flag being cleared.
Scoreboard registers are commonly used in microprocessors to indicate the state of, and control usage of, registers and execution units. Modern , microprocessors can often execute more than one instruction at a time, but there can be a problem if two instructions are simultaneously attempting to access the same register. As the number of registers in a typical microprocessor is not large, the size of the scoreboard register itself does not need to be large. However, implementation using the standard multiple write- port memories available in FPGA may mean that much of the memory is wasted, as they are typically much larger than required for this application.
According to the present invention there is provided a digital memory for storing information in bit form, characterised in that the memory includes locations for storing at least one information bit as a plurality of working bits, and combinatorial logic output circuitry for generating the information bit from the working bits. The present invention allows a multiple port memory to be implemented using a relatively small amount of additional logic functions, Having the information bit stored as a combination of working bits allows the modification of the information bit by manipulation of at least one of the working bits. ' The working bits are preferably divided into sets, where each set consists of at least one working bit for each separately addressable location. The sets of working bits are arranged such that each set may be addressed for writing through a single port but may be addressed for reading through a plurality of ports.
Writing an information bit to one of the ports will influence one of the sets of working bits. As the value of a working bit is combined with other, corresponding, working bits from other sets to reproduce the information bit, the values of the other corresponding working bits needs to be taken into consideration between writing of the information bit and storage of the corresponding working bit. This is done by using combinatorial logic that takes as its input the information bit to be written, along with the corresponding working bits from all other sets to produce the working- bit for the set currently being written to through its port.
When reading an information bit from one of the ports, the working bits from the particular address being read are combined together through a combinatorial logic function which takes all information bits into consideration from the address in question, and provides as an output a single bit, this being the required information bit.
Preferably, the combinatorial logic used to manipulate the working bits during a read operation or during a write operation comprises a one bit addition of
the logic's inputs. For a two port system, having two sets of working bits, this may be achieved using an Exclusive OR gate (XOR)
The current invention is particularly suitable for implementation as a two port 5 system, providing a simultaneous read and write facility from two independent ports. The invention is also suitable for systems requiring more than two read or write ports. Additional ports may be added by providing storage for additional sets of working bits. The working bit storage for any additional ports must be connected to the working bit storage of the existing ports by 0 means of combinatorial logic as discussed above. Additional ports may be read-only, or may be full read-write ports
According to another aspect of the invention there is provided a scoreboard register having a plurality of flag bits, each of at least two of said flag bits 15 being allocable to a logical resource so as to indicate the state of the logical resource, wherein each flag bit has associated with it a plurality of working bits, each of which may be addressed from a different port, and an output
<-. combinatorial logic function of the working bits is used to generate the flag bit
20 The present invention allows a scoreboard register to be produced in an efficient manner, particularly when implemented using logic functions available on typical programmable logic devices. The present invention is particularly suitable for implementation on an FPGA.
25 Implementation of the current invention on an FPGA or other programmable logic device may be performed by creating a circuit description in electronic form and then transferring this electronic description to the device. The transfer will involve reformatting the electronic description into a suitable format for download to the device.
30
According to a further aspect of the invention there is provided a method of storing digital information in a memory having at least two write ports comprising the steps of:
receiving an information bit from a data bus associated with a first write port; adding this information bit to a set of corresponding working bits taken from memories associated with all other ports, the addition being one bit addition; storing the result of the addition in a memory associated with the first write port;
The invention will now be described in more detail, by ay of example only, with reference to the following Figures, of which,
Figure 1 illustrates in block diagrammatic form, how a single information bit may be stored as two separate working bits;
Figure 2 illustrates in block diagrammatic form a practical implementation of the current invention using a Xilinx® FPGA, showing two read-write ports;
Figure 3 illustrates in block diagrammatic form a practical implementation of the current invention using a Xilinx® FPGA, showing two read-write ports with a third read port
Figure 4 illustrates in block diagrammatic form a practical implementation of the current invention using a Xilinx® FPGA showing three read-write ports.
Figure 5 illustrates in block diagrammatic form a practical implementation of the current invention using a Xilinx® FPGA showing a system acting as a scoreboarding register.
Figure 1 shows a logic function having two inputs, A and B, and a single output, Q. Here, two D-type flip-flops 101, 102, each independently capable of storing one bit of information have their outputs coupled together through an Exclusive OR gate 103. The flip-flops 101 , 102 are each configured such
that by pulsing an input A, B, the output QA QB toggles its state. The logic function is designed to store one information bit as a combination of two working bits. The flip-flop QA QB outputs here are the working bits of this system, whereas the Q output is the information bit stored as the function of the two working bits. Note that altering either of the working bits in the flip- flops 101 , 102 will change the Q output. Therefore, before storing a bit in one of the flip-flops, 101 or 102, the state of the other needs to be taken into account, so that the correct value is presented to the output Q. This is done using further logic not shown in this Figure
Figure 2 shows a practical implementation of a memory with two independent read and write ports. It is based on the principle demonstrated using Figure 1 , but is shown using higher level functions which are available as primitives in the Xilinx® "Virtex 2" FPGA device. Here, two memories 1 , 2 known as Look Up Table (LUT) Random Access Memories (RAM) act as the storage elements for the working bits. Each of these primitive LUT RAMs 1 , 2 has a single interface, e.g. 3, for writing information to it, but has two interfaces e.g. 3, 4, for reading information from it. It is thus not useful on its own if it is required to have dual write access. However, combining two such devices together in the manner shown allows a true dual port memory to be produced.
Note that each LUT RAM" 1 , 2~acts effectively as a store for one set of working bits. As each has two read interfaces, it is able to provide as an output simultaneously two different working bits from the same set, one for each port. Each port is arranged such that its address bus, e.g. 7 goes to both LUT RAMs 1 , 2 - on one to the read-write interface, e.g. 3 and on the other to the read only interface, e.g. 6. The extra read port thus provides a mechanism for allowing one set of working bits to be taken into account when writing to another set, ensuring that the correct information bit is stored or retrieved.
To illustrate mode of operation, assume all memories are initially cleared to 0, and a logic 1 is to be written to address n, using port A. The address lines (Addr A) 7 of port A are set to address n, Write Enable A, (WEA) 9 is activated
and the information bit to be stored is put on Din A, 8. Din A 8 acts as one of the inputs of a two input XOR gate 10. The second input 11 comes from the output of the read-only interface 6 of LUT RAM 2, which is a logic 0. Following a clock pulse (clock inputs not shown), the logic 1 on the output of XOR gate 10 is clocked into the LUT RAM 1 and acts as the working bit for that port at address n. The working bit stored in the set is therefore the XOR of the information bit and the corresponding working bit of the other set. Note that writing to a LUT RAM as provided on a Xilinx® Virtex or Virtex2 device is a synchronous operation requiring a clock input, but reading data from it is an asynchronous process not requiring a clock pulse. This enables the implementation of Figure 2 to write data in a single clock pulse: the corresponding working bit from the read-only interface 6 can be arranged to arrive at the input 11 of XOR gate 10 before the clock pulse is transmitted to the LUT RAM 2.
The operation of writing to port B is carried out analogously to that of writing to port A. XOR gate16 provides the input combinatorial logic for the port, and the store for the working bits of Port B are in LUT RAM 2.
Reading from address n, using, say port B, works as follows. Address n is set up on Addr B 12. This takes a working bit from address n of LUT RAM 1 read-only interface 4, along with a working bit from address n of LUT RAM 2 read-write interface 5. The first of these working bits will be logic 1 due to the write operation described above, and the second will be logic 0 as there has been nothing written to this memory device yet. The two LUT RAM outputs 13, 14 addressed by port B are fed to an XOR gate 15, the output of which is the port B data output. In this case, the output will be a logic 1 , reflecting the logic 1 that was stored at the address using port A as described earlier.
The operation of reading from port A is carried out analogously to that of reading from port B. XOR gate 17 provides the output combinatorial logic for port A?, and the store for the working bits of Port A are in LUT RAM 1.
The dotted region 35 of Figure 2 indicates the logic that is not in the same slice as the other logic devices when implemented using Xilinx® Virtex or Virtex2 FPGA devices. Each slice in such a device comprises a set of logic primitives that may be configured, according to the architecture of the device, into commonly used functions particularly conveniently. Logic functions may however be used from other slices if desired.
Figure 3 shows another embodiment of the current invention, where the implementation has been scaled to provide a third read-only port. Reading and writing to ports A and B are carried out in an identical manner to that described above, but placing a valid address onto port C will result in the data stored at that address appearing at Dout C. The additional logic required over the two port implementation of Figure 2 comprises extra storage LUT RAMs 18, 19 and extra output combinatorial logic 20. The extra storage is necessary to provide additional read ports for two sets of working bits, and this storage provides its read-only outputs to the XOR gate 20. Each LUT RAM 18 and 19 hold data identical to that of their corresponding LUT RAMS 1 and 2 respectively.
Figure 4 shows a further embodiment of the current invention, where the implementation shown in Figure 2 has been scaled to provide three read-write ports 104, 105, 106. This works in a way analogous to the-implementation of Figure 2. The three ports 104, 105, 106 require three sets of working bits, and these are provided by the three LUT RAM pairs 21 , 22, 23. Pairs are needed here so that there are a sufficient quantity of independent read ports available to provide, to all ports, copies of the working bits from the corresponding sets of working bits from the other ports. This enables the working bits from all sets to be taken into account when writing new data or reading data using any one of the ports.
Further embodiments may be produced having additional read-only or read- write ports. The skilled person will see that additional ports may be added by scaling up the basic architecture of the present invention. Each additional
write port will need storage for an associated set of working bits and logic to ensure information bits written to the port take due account of the corresponding working bits of all other ports.
Figure 5 shows yet another embodiment of the current invention, this being a two port scoreboarding circuit. The circuit is based upon that of the dual read- write port memory shown in Figure 2, with some small but important differences. Firstly, data in lines for ports A and B are not present. This type of circuit is not required to store random data coming in from the ports. It merely has to toggle flag bits according to requests from the ports to do so. Each flag bit in the scoreboard is used to indicate the status of a register, or other logic resource in a system. If the flag bit is active (eg at a logic 1), then the resource is unavailable, whereas if the flag bit is inactive (eg logic 0), then the resource is free to be used.
Storage of the flag bits is performed in a similar way to storage of information bits in the multiple port memory. Each flag bit is stored as (in this case) two working bits, and the flag bit is reproduced by applying the working bits to a combinatorial logic function. The output of this is the desired flag bit. As with the multiple port memory, the combinatorial logic comprises a 1 bit addition of the working bits.
If it is required to set a flag at some address location, indicating the register will be in use, (assuming that a check has already been done to see that it is clear), the address lines 24 of port B are set appropriately, and the write- enable line B 25 is asserted. This will take the working bit from LUT RAM 26 read port 28, which will be a logic 0, and present this to the data-in input 29 of LUT RAM 27 through inverter 30. A logic 1 will therefore be written as a working bit.
Clearing the flag at this same address is done by driving the address lines 33 appropriately on to Port A, and asserting the Write-Enable A line 34. This stores into the LUT RAM 26 the value of the working bit taken from LUT RAM
27, currently a logic 1. Reading this address from any port after this operation will take the working bits from both LUT RAMs 26, 27 (currently both at logic 1 ) and combine them using the one bit adder, or XOR gate,31 or 32 to produce the flag bit value of logic 0.
A scoreboard circuit as described can be implemented efficiently in a Xilinx® Virtex or Virtex2 FPGA, where it can be made using logic functions from two slices. Use of a programmable logic device is particularly convenient for implementing the current invention as the logic primitives supplied in the device are generally arranged to be adaptable to several tasks, so increasing their utility. Programmable logic devices provided by other manufacturers may have different arrangements of logic elements, and the skilled person will understand that the principle of operation shown in the invention and embodiments as described above may be applicable to these other devices. Exact implementation details may not be the same however.
Implementing the current invention in a programmable logic device is typically done by first providing a electronic description using a graphical or text based software. Various suppliers provide software suitable for this, including Mentor Graphics™, Viewlogic Systems™, Synplicity™ and Synopsys™. Such programs are able to describe the circuit at various levels of abstraction. Some describe the circuit operation in terms of constructs similar to that used typically in many programming languages. These descriptions are then fed into a synthesis tool that generates the actual circuit design netlist from the description. Other methods involve the user entering the circuit design directly as a hardware description, where no synthesis is needed. The user can, with these methods, impose limitations on the way the circuit is laid out on the device.
The netlists produced by these methods may then be processed by tools specific to the target device to produce a configuration file, the production of which involves checks to ensure that the electronic description of the circuit is compatible with the capabilities of the particular programmable logic device.
The configuration file also contains placement and routing information, either completely synthesised from the netlist information or taken from limitations imposed by the user. The configuration file is then downloaded to the device to produce a working circuit.
It is advantageous with the current invention, particularly when implementing it in a Xilinx® Virtex or Virtex2 FPGA, to impose placement limitations that restrict the layout such that the minimal number of slices are used to create the embodiments.
The skilled person will be aware that other embodiments within the scope of the invention may be envisaged, and thus the invention should not be limited to embodiments herein described.