GB2517478A

GB2517478A - A method of transmitting data graph structures from one computer to another computer

Info

Publication number: GB2517478A
Application number: GB1315029.7A
Authority: GB
Inventors: Simon Robert Wiseman
Original assignee: Deep Secure Ltd
Current assignee: Everfox Ltd
Priority date: 2013-08-22
Filing date: 2013-08-22
Publication date: 2015-02-25
Also published as: GB201315029D0

Abstract

A method of transmitting a graph structure from a first computer to a second computer in which a plurality of messages containing the data of a node of the structure are sent from the first computer to the second computer. The second computer constructs nodes in its memory in response to the messages received in such a way that part of a node can be distinguished from the remainder of the node. The messages may include a reference to a previously sent node, or to a child pointer which must be updated with the address of a newly created node. These references are validated by determining if they refer to an inappropriate part of a node.

Description

A Method Of Transmitting Data Graph Structures From One Computer To Another Computer

Field of the Invention

The present invention relates to a method of transmitting graph data structures from one computer to another across a communications network. The invention relates particularly, but not exclusively, to transmission of acyclic graph data structures.

Backzround to the Invention A tree structure held in one computer can be transmitted to another computer in a series of steps, each step sending a message about one node of the tree. Nodes maybe sent in a depth-first, left-to-right fashion. This is readily accomplished by a recursive scan of the tree. The messages contain the node's name or other data and a count of the number of its child nodes.

The receiver maintains a stack of pointers to nodes that have been received but whose parent node has yet to be received. Received nodes are appended to a memory buffer and pointers to thcir child nodes are popped from the stack of pointers.

Trees constructed in this way cannot be malformed as a result of receiving erroneous messages. In particular links are always valid references to a node and the structure is acyclic.

Where a free contains multiple sub-trees that are identical, an application may choose to store the sub-tree just once, representing the tree as an acyclie graph by replacing all links to an instance of the common sub-tree to one common copy of the sub-tree.

A tree represented as an acyclic graph can still be transmitted by a recursive scan, but this means any common sub-trees are transmitted multiple times and the receiver will construct a tree containing separate instances of the common sub-trees. This consumes additional space and may result in the application pmcessing the same data multiple times and so is to be avoided.

It is pmposed that the sender marks nodes of the tree as they are sent and if encountered again, because there is a common sub-tree, a special message can be sent that refers to the previousiy sent common sub-tree. This special message refers to a node that has already been constructed.

The reference could be the index in the message stream of the message that carried that node, but the receiver must then search the memory for that node. The results could be cached but if the cache becomes full a search may still be required.

A parallel table of node addresses could be maintained, accessed by the index, but this results in inefficient memory address space usage as available memory must be divided into node storage, a stack and the table of node addresses. Titilisation with such a three-way allocation can be improved by splitting the tables into non-contiguous chunks, but this requires an clement of scanning which makes execution

time unpredictable.

It is instead proposed that the reference is the address, predicted by thc sender, of the node. However an attacker could deliberately send the ng address causing the receiver to eonstntct invalid acycic graphs in which a node references the internal data of another node rather than the start of a node. So a search or parallel table is still needed to validate the references. Embodiments of the present invention seek to address this problem.

Acyclic graphs can also be transmitted from one computer to another in a top-down fashion. This has the advantage that the receiving computer's application can commence processing the graph before it has been completely received.

With this approach the receiving computer will create nodes before the nodes they reference have been created, so when a node Is created It is necessary to update the nodes that reference it. The transmitting computer therefore needs to instruct the receiving computer as to the location of the nodes that need updating by providing a reference to them. Messages carrying these references may be erroneous) possibly as a result of attack, causing an invalid graph to be constructed. Embodiments of the present invention seek to address this problem.

Summary of the Invention

According to the present invention there is provided method of transmitting a graph structure from a first computer to a second computer using a plurality of messages containing the data of a node of the structure, wherein the second computer constructs nodes in its memory in response to the messages received in such a way that part of a node can be distinguished from the remainder of the node.

If a reference is sent to a previously sent node, or to a child pointer which must be updated, the reference may be validated by detecting if it refers to the correct part of a node stored by the second computer.

The method is thus particularly suited for transmitting an acyclic graph structure.

A tag bit associated with each data location may be used to distinguish between one part of a node and another.

Alternatively, part of a node may be distinguished from other parts of the node by containing a special value that is never stored in other parts of the node.

The plurality of messages may additionally contain references to previously transmitted nodes. This is the case where graphs are transmitted in a bottom-up fashion.

The second computer may construct nodes in such a way that the start of a node can be distinguished from the remainder of the node. Then, the second computer can validate the references contained in the messages by checking if the location they reference is the start of a node. If the location is identified as not being the start of a node, for example because a coresponding tag is not set or no special value is stored at the location, the reference may be rejected and a request sent for retransmission of the node.

Memory in the second computer may be arranged into fixed-sized data blocks containing an area of data values and an area of tag bits for those data values. The memory in the second computer may be additionally arranged into one or more overflow blocks that contains just data values data and the data blocks start with a special value that distinguishes them from overflow blocks.

The plurality of messages may additionally contain references to child pointer locations in previously transmitted nodes. This is the case where graphs are transmitted in a top-down flshion.

The second computer may construct nodes in its memory in such as way that child pointer refercnccs that havc not bccn populated can be distinguished from thc remainder of the node. Then, the second computer may validate the references contained in the messages by checking if the location they reference is a child pointer that has not been populated. lithe location is identified as a child pointer which has been populated, for example because a corresponding tag is not set or no special value is stored at the location, the reference may be rejected and a request sent for retransmission of the node.

BricfDcscrintion of thc Dntwinizs In order that the invention may be more clearly understood, prior art and embodiments of the invention will be now described, by way of example only, with reference to the accompanying drawings, of which: Figure 1 shows a tree structure; Figure 2 shows a prior art technique for transmitting a tree from one computer to another; Figure 3 shows the processes by which a transmitting computer transmits and a receiving computer receives a tree; Figure 4 shows how a tree with common sub-trees can be stored more efficiently as an acyclic graph; Figure 5 shows how prior art techniques of transmission of a tree result in common sub-trees being stored multiple times; Figure 6 illustrates an improved method where common sub-trees are stored just once; Figure? shows an overview of a method of an embodiment of the invention; Figure 8 shows an implementation of the embodiment of figure 7; Figure 9 shows how the embodiment of figure 7 helps to identify transmission errors; Figure 10 shows another embodiment of the invention; Figure 11 shows yet another embodiment of the invention; Figure 12 shows processes performed by a transmitting computer implementing a method according to an embodiment of the invention; Figure 13 shows further processes performed by a transmitting computer implementing a method according to the embodiment of the invention of figure 12; Figure 14 shows an example of data transmitted by a computer performing the processes of figures 12 and 13; Figure 15 shows processes performed by a receiving computer implementing a method according to the embodiment of the invention shown in figures 12 to 14; Figure 16 shows further processes performed by a receiving computer implementing a method according to the embodiment of the invention of figures 12 to 15; Figure 17 shows an example data transfer using the processes illustrated in figures 12to16; Figure 18 shows another prior art technique for transmitting data from one computer to another; and Figure 19 shows another embodiment of the invention, applicable to the technique shown in figure 18; Figure 20 shows a still further embodiment of the invention, applicable to the technique shown in figure 18; Figure 21 shows how an application running on the second computer can synchronise with building of the graph data structure on the computer; Figure 22 shows processes performed by a transmitting computer implementing a method according to another embodiment of the invention; Figure 23 shows an example of data transmitted by a computer performing the processes of figure 22; Figure 24 shows processes performed by a receiving computer implementing a method according to the embodiment of the invention shown in figures 22 and 23 Figure 25 shows further processes performed by a receiving computer implementing a method according to the embodiment of the invention of figures 22 to 24; Figure 26 shows an example of the use of top level and second level tables by a receiving computer implementing a method according to the embodiment of the invention of figures 22 to 25; and Figure 27 shows further processes performed by a receiving computer implementing a method according to the embodiment of the invention of figures 22 to 26.

Detailed Description of the Invention

Referring to the drawings, by way of background Figure 1 shows a tree structure 100.

Trees are a special case of a graph where a number of nodes are linked such that: * each node has a number, possibly zero, of links to child nodes; * exactly one node has no links leading to it; and * all other nodes have exactly one link leading to them.

The free structure 100 comprises a plurality of nodes 101, 104 and 105. The root 101 of the tree has no ancestors and two children 104, 105 identified by links 102 and 103 respectively.

A first computer containing the tree shown in Figure 1 can transmit it to a second computer using the steps shown in Figure 2. The first computer traverses the tree in a depth first, left to right fashion and for each node visited sends a message to the second computer. The messages contain the name or other data contained in the node and the number of child nodes.

The receiving computer builds a node for each message received and adds a pointer to the new node to a stack. When a node with N children is built, N pointers are popped from the stack and are built into the node as its children.

Figure 2 shows thc five steps needed to transmit the tree structure shown in Figure 1.

Initially the receiving computer's stack 202 is empty and the node store 203 is empty.

In step I the message 211 containing node D, which has no children, is sent. The receiving computer pops no links from the stack and creates a node containing D with no child links in the node store 213. It then pushes a link to node D onto the stack 212.

In step 2 the message containing node E, which has no children, is sent. The receiving computer pops no links from the stack and creates a node containing E with no child links in the node store. It then pushes a link to node E onto the stack.

In step 3 the message containing node B, which has two children, is sent. The receiving computer pops two links from the stack and creates a node containing B and the two links in the node store. It then pushes a link to node B onto the stack.

In step 4 the message containing node C, which has no children, is sent. The receiving computer pops no links from the stack and creates a node containing C with no child links in the node stores. It then pushes a link to node C onto the stack.

In step 5 the message containing node A, which has two children, is sent. The receiving computer pops two links from the stack and creates a node containing A and the two links in the node stores. It then pushes a link to node A onto the stack.

As this is the last message, the node store now contains the tree structure and the stack contains one link that refers to the root of the tree.

Figure 3a shows the process utilised by the transmitting computer. The transmitter uses a stack to remember the path it has taken to reach the node it is processing. The stack contains pairs of values, the first value being (the address of) a node that has yet to be sent and the second value is an index into that node's children indicating which child node is being processed.

Illitially the top node of the tree and index 1 are pushed onto the stack as a pair.

At any time the stack contains a list of nodes on the path from the top that the transmitter has taken. The indexes indicate which child of one node leads to the next node in the path.

Once the stack is empty transmission is complete.

If not empty, the top pair is popped from the stack. This is the node to be processed and the index indicates which of its children is next to be processed.

If the index is greater than the number of children then all the children have been transmitted and now the node's own data can be sent.

If the index is less than or equal to the number of children then this refers to the next child to transmit. In this casc the nodc is first pushed back onto thc stack with the index incremented to indicate that its next child still needs to be transmitted. The child node is transmitted by pushing it onto the stack along with an index of one to indicate that its first child needs to be sent. The transmitter then loops and finds this pair on the top of the stack.

Figure 3b shows the process used by the receiving computer. The computer maintains a stack of the memory addresses of nodes that have been received but whose parent nodes have not yet been received.

On receiving a node data message from the transmitting computer the node data and the number of child nodes (if any) are extracted from it and the receiving computer creates a new node which contains the node data and has space for each of the pointers to the indicated mimber of child nodes. The receiving computer then pops an address for each child node from the stack and stores that address in the new node, in the space cfi for the child nodes. Then, an address for the new node, just created, is pushed onto the stack. This process is repeated for subsequent messages.

Figure 4a shows a tree 401 which contains two sub-trees 402 and 403 that have identical content and structure. Figure 4b shows how the tree of figure 4a can be represented more efficiently, using an acyclic graph, with only one copy of the common sub-free being stored 412. The sub-tree is referenced by links 421 and 422.

An acylic graph is a special case of a graph where: * each node has a number, possibly zero, of links to child nodes; * exactly one node has no links leading to it; and * no node is its own ancestor.

Figure 5 shows the effect of transmitting the tree of Figure 4b using the prior art method of Figure 3. The common sub-tree 412 is traversed and transmitted twice, which results in it being stored twice. Thus the efficiency of using an acyclic graph is lost.

Figure 6 shows a proposed, improved method for transmitting the acycic graph of Figure 4b in which the common sub-tree is only transmitted and stored once. The transmitting computcr marks thc nodes of thc tree to indicatc whcthcr thcy havc already been transmitted. If the transmitter encounters a node that has already been transmitted the sub-free is not sent again. Instead a special message is sent that in some way identifies the previously transmitted sub-tree. On receipt of such a message, the receiving computer locates the previously received node and pushes a link to it onto the stack. There are several ways in which the special message can identi& the previously transmitted sub-tree.

A preferred method is that the transmitting computer predicts where the receiving computer will store each node and the special message contains the address of the previously sent sub-tree. The transmitting computer can readily predict the address of the location of each node relative to the start of the node memory by counting the size of cach nodc it transmits. On rcccipt of a spccial mcssagc the receiving computcr can simply translate the relative address into the link that is pushed onto the stack by adding the base address of the node store to it.

However, this has the disadvantage that an invalid address sent by the transmitter will cause the receiving computer to build an invalid tree. The receiving computer could validate the address in the special message by scanning through the node store from the start until the given address is Ibund, but this has the disadvantage of requiring a potentially time consuming search. The receiving computer could use a fixed size content addressable memory to record the addresses of the most recent received nodes, but if the address is not found in this memory a full search is still required.

The present invention provides a method for validating the address of a previously sent sub-tree comprised in a special message, without requiring additional tables that result in sub-optimal memory utilisation and without needing to search through memory.

Referring now to Figure 7, in one embodiment this is achieved by providing tags 701 associated with locations 702 in the node store 700. Each tag indicates whether a node is stored starting at the corresponding location or the location contains some value from within a node. By way of example, the tag 710 is set, to indicate that its location contains the start of node C 711, while tag 720 is clear to indicate that its associated location 721 contains part of the node B but not its start.

On receipt of a special message containing the address of a previously sent node the receiving computer retrieves the tag of the location referenced by the address. If the tag is set then the receiving computer can infer that the address is valid. If the tag is clear then the receiving computer can deduce that the address is invalid and reject it.

Thus the receiving computer can validate the structure of the tree it builds without searching the node store.

Figure 8 shows a direct method of providing the tags. Where data values in the nodes are 32 bits in length a computer memory with 33 bits per word is used, with 32 bits holding data and one holding the tag value.

This method is straightforward, but standard computer memory parts are often organised into words whose width reflects the size of data values manipulated which leaves no room for a tag bit.

An alternative method of distinguishing between the start of a node and other values stored within it is shown in Figure 9. Here a header word is added to the start of each node. The header word contains a special value that uniquely identifies it, in that the special value cannot be found in the remaining words of the node, or any other node.

The validity of a pointer is established by reading the word that it refers to. If the S value read equals the special value then the pointer is known to reference the start of a node and hence is valid. If the value read does not equal the special value then the pointer is known to reference some word of a node other than its start and hence is invalid.

As one skilled in the art will appreciate, identifying a special value in practice is straightforward. The words in a node will generally contain characters, pointers and counters. Pointers and counters arc generally restricted to small values because of limits on the size of the memory. Characters are numeric values in a restricted range, for example Unicode characters are never larger than 221. So if the word size is 32 bits, a special value of 232_i can be used.

By way of example, if an XML document is represented as a trce structure, it comprises nodes that are named, have named text attributes and may contain text. So an XML document can be represented as a tree of numeric counter values less than 2321 and Unicode characters less than 22 using a special valuc of 232_i.

The use of a special value in the header does preclude its use within the node, whereas the use of tag bits does not.

Another method of storing tag bits that does not involve the use of special memory parts is showil in Figure 10. The memory is divided into fixed sized blocks. Each block contains some tags and some data, with one tag bit for each data word. The tag bits indicate whether the corresponding word is the start of a node or some data within a nodc.

A pointer is validated by first calculating the address of the start of the block containing thc referenced location. Thc calculation is to dividc the pointer address by the block size to find the (integer) block number and then multiplying this by the block size to give its start address. Next the tag bit address is calculated by adding the offset of the referenced word within the block to the start address of the tag bit array.

The tag bit can now be read to validate the pointer.

When a node is created, it must be stored such that it is wholly contained within the data area of a block. This means some blocks will contain unused space. It also limits the size of nodes that can be stored in the trcc. This limitation can be lifted if blocks are given a header that contains a special value, in the same way as described with respect to nodes in Figure 9. A large node can now overflow from one block to another if it is so arranged that none of its data that is stored in locations normally used by a block header contain the special value.

Nodes will usually become large only if they contain some large array of elements.

Where the array contains characters finding a special value is straightforward as previously discussed, so the technique readily applies. If the array contains general integer values it cannot be stored in locations normally used by a block header as it may contain the special value. Instead it becomes necessary to split the array into portions, each of which is stored wholly within a block's data area.

Figures 12 to 17 illustrate an embodiment of the invention where node tagging is employed to enable data transmitted from one computer to another using a bottom up approach to be verified. The invention is implemented by appropriate software running on respective transmitting and receiving computers.

Figure 12 illustrates the process employed by the transmitting computer to transmit data using a bottom up approach.

The process proceeds in a similar fashion to that illustrated in Figure 3a, but with some additional steps. Where an entry's index which has been popped from the stack is less than or equal to the child count and a new entry is pushed onto the stack giving the node's address and the index of its next child to process the computer determines if the child node has already been sent. If the child node has not already been sent the computer proceeds as per the process illustrated in Figure 3 and a new entry is pushed onto the stack containing the child's address and index of one and the child node is then marked as having been sent. 1-lowever, if the child has already been sent a message containing a reference to the child node is sent.

The method by which the reference is determined is illustrated in Figure 13. It is known that thc recciving computcr follows a simple scheme of allocating nodes to memory in a sequential fashion (described below). The enables the transmitting computer to predict where nodes will be stored and so enables it to pass references to nodes as addresses.

The transmitting computer maintains a predicted address variable which is initially set to zero.

When the transmitter sends a node it first sends the node's data and the number of its children. Then it marks the node with the current value of the predicted address variable. The predicted address variable is subsequently incremented by the size of the transmitted node's data plus the size of its child pointers. Thus, when the next node is sent the predicted address variable will represent the address at which the beginning of that node will be stored by the receiving computer.

When the transmitting computer needs to send a node that has already been sent a reference message is sent containing the predicted address associated with the transmitted node.

Figure 14 illustrates how the graph illustrated in Figure 4b would be transmitted by the transmitting computer.

1. First the initial entry for the tree as a whole is pushed onto the stack. This contains the top node and index 1 to indicate that the first child of the top node is to be processed next.

2. (A,1) is popped, the index is incremented and (A,2) is pushed back on to the stack to record that the top node's second child will be processed next. Then the first child is processed, which means an entry containing it, node B, and the index 1 is pushed onto the stack.

3. First child of B is processcd ncxt. (B,2) is pushed onto the stack to rccord that thc second child of B needs processing, then the first child is processed by pushing (C,1) onto the stack.

4. The first child of C is processed, but C has no children so that means all its children have been deaR with and now a message containing C is sent, &ong with a child count of zero.

5. The second child of B is processed. (B,3) is pushed on to the stack as the third child of B needs processing, then the first child of B (which is D) is pushed onto the stack.

6. The first child of D is processed. D has no children so now a message containing D is sent, along with a child count of zero 7. The third child of B is processed, but B only has two children so now a message containing B is sent, along with a child count of two.

8. The second child of A is processed. (A,3) is pushed onto the stack to rccord that the third child must be processed next. Then the second child of A is processed. This is node B, but this has already been sent so a Reference to B is sent in a message to the receiver.

9. The third child ofA is processed, but this does not exist so a message containing A, along with a child count of two, is sent.

Figure 15 illustrates the process implemented on the receiving computer to enable it to receive data from the transmitting computer. This proceeds in a similar fashion to the process illustrated in Figure 3b, save for the addition of some steps. After receiving a message the computer first determines if the message is accompanied by a reference. If not, the process proceeds on the same basis as illustrated in Figure 3b.

If a reference is detected the node location it contains is extracted and converted to a real memory address by adding the base address of the node stored to it. Then the real memory address of the referenced node is pushed onto the stack.

The process is then repeated for any subsequent received messages.

Figure 16 illustrates how the receiving computer handles references.

The receiving computer uses a simple means of allocating memory for the nodes it creates which allows the transmitter to predict where a node will be stored. Each node is stored in consecutive memory locations. The data of a node is stored first followed by the addresses of each child node.

A variable, called Next Address, is used to track the next free location where the next node received is to be constructed. Initially the Next Address variable is set to zero.

Whcn a node data message is rcccivcd, the Next Address variable is pushed onto the stack, recording where the node is about to be constructed and the Next Address location is marked as the start of the node. If tags are used this is achieved by setting the tag, while all othcr tags corresponding to the contents of the node are cleared. If a special value is used to indicate the start of a node, the special value is written to the node store at Next Address and Next Address is incremented so it refers to the location that is to receive the node's data.

Then the data from the message is stored in the memory at the location given by Next Address. The number of child nodes is extracted from the message and this number of pointers are popped from the stack and stored in memory at the location given by Next Address.

After this, Next Address is incremented by the size of these pointers. Next Address now refers to the next free memory location after the node just constructed, which is where the next node received will be placed.

When a node reference message is received, the location it carries is extracted and checked.

The address of the location is compared with the Next Address variable. If the location is greater or equal to Next Address then it is clearly invalid since it is referring to a memory location beyond where any nodes have yet to be constructed and thc mcssagc is rqjcctcd.

II the location is less than Next Address the receiving computer then checks if the tag at the location has been set, or the location has otherwise been identified as the start of a node. If not, then the location does not reference the start of a node and it is invalid, because child pointers must always point to a node.

lIthe location does reference the start of a node that has already been constructed then this address is pushed onto the stack ready to be placed into its parent node when that is constructcd.

Figure 17 illustrates a further example transmission between the transmitting and receiving computers.

The upper part of the figure shows a simple tree containing a common sub-free with two references.

Following the steps described earlier the transmitting computer will send the three messages shown in the figure.

The tree is sent in a depth-first, left-to-right flishion so the Blue node is sent first This is thc first child of the root node and it has no children. Receipt of this causes thc receiver to store "Blue" in a new node and push the address of the start of this node onto the stack.

The next node to send is the second child of the root node, but this is the same sub-tree as the first child so has already been sent. Hence the second message is a reference to the predicted location of the first node sent, which is address zero. This causes the receiver to push the address onto the stack, so there are now two pointers to the Blue node on the stack.

The final message sends the data of the top node, "Red", which has two children. The data is copied into the next free memory location, after the Blue node, and two node addresses arc popped from the stack and copied into memory after the node's data.

Both pointers are to address zero so correctly point to the Blue node.

The lower part of the figure shows what happens if the transmitter sends incorrect messages and the receiver does not check for these. The first message is correct and results in the Blue node being constructed and its address pushed onto the stack.

The second message is a reference, but the address is inconect. Without the check the receiver pushes this onto the stack. Then when the Red node is received and built, the first address popped points correctly to the Blue node, but the second address points into the middle of the Blue node. Thus the structure created is not a valid tree.

If an application were to process the structure assuming it was a well formed tree, it will have unpredicatable and possibly insecure behaviour. For example by interpreting the "u" as the start of a node, the application may believe the node's data to be "uRed" and that it has two children, the second being itself. Thus the application sees a cyclic structure, which is effectively infinite and so may take infinite time to process, causing a loss of the service that the application is supposed to be providing.

By detecting that the address to "u" is incorrect this enor is avoided.

It will be appreciated that in this embodiment the receiving computer could use one of the other methods discussed above for identifying the start of a node, other than setting a tag bit.

An acyclic graph can also be transmitted from one computer to another in a known top-down fashion by sending a node before its child nodes. A node is sent as a message containing the node's data and a count of the number of child nodes. Figure 18 shows how graph 801 can be transmitted in a top-down fashion as a series of messages 811. Received nodes are constructed in a node store 812.

When a node is constructed its child pointers are set to the NULL pointer value.

The message that constructs a node contains references to all the pointers in previously received nodes that must be populated with a pointer to the new node.

If the references to pointer locations given in a message are invalid, the receiver will construct an invalid graph. This may lead to serious errors in the application processing the graph and so must be avoided.

Other embodiments of the invention may be employed to validate references to pointers, without requiring costly searches or resulting in inefficient use of memory. The method arranges that memory locations containing NULL pointers can be distinguished from locations containing populated pointers or data values.

In the embodiment illustrated in figure 19a the memory locations of the node store 902 are tagged with tag bits 901. These can be set in order to indicate that the memory location has a NULL value.

Alternatively, in the embodiment illustrated in figure 19b NULL pointers are set to a special value 912 that is never used as a pointer 911 or data value 913.

Thus, if a reference to a pointer location refers to a location which does not have an associated tag set to indicate that the location has a NULL value, or is not set to a special value, the receiving computer knows that the location is an invalid one.

An advantage of a receiving computer constructing a graph in a top-down fashion, is that the application on the receiving computer may start processing the graph before it has been received in its entirety. Figure 20 shows a graph 1001 that has been partially received, with nodes 1002 and 1003 received and node 1004 yet to be received.

At this point the node store contains a representation 1011 of node 1002 including a pointer 1012 to its child 1003 that has been received and a NULL pointer 1013 that will be set to point to its child 1004 once that has been received.

If, while processing received node 1002, stored as 1011, the application needs to follow the pointer to node 1003 it may do so as the pointer is populated 1012.

However if it needs to follow the pointer to node 1004 it must wait because the pointer has yet to be populated 1013. Once the receiver has received and built node 1004 pointer 1013 will be populated so that it refers to the stored node and the application may proceed with processing.

Figure 21 illustrates the method by which the receiving computer's graph builder and tree processing application cooperate to allow the application to proceed with its processing in parallel with the graph being built. A variable shared by the two processes, referred to as AppLocation in the flow charts, is used to synchronise the two processes when this is necessary. The variable is set by the application when it finds it needs to follow a NULL pointer. Having set the variable the application waits for a signal from the tree builder process. When the graph builder creates a node it fills in all the required pointers to that node to complete the graph. Once a pointer is updated, the builder checks the referenced location against the AppLocation variable and if equal it signals the application to indicate that the pointer has now been populated and can be read to continue processing.

Figures 22 to 27 illustrate a further embodiment of the invention employing a top down technique for transmitting data from a first transmitting computer to a second receiving computer. The method is implemented by appropriate software running on the respective computers.

Figure 22 illustrates the steps performed by the transmitting computer to transmit data in a top down fashion.

The process runs along similar lines to that illustrated in Figure 18.

The transmitter uses a queue to keep track of which nodes need to be sent. It also marks nodes with a sequence number and a list of parent nodes that reference them along with an indication of which child of that parent node they are.

The transmitter starts by putting the root node onto the queue.

While the queue is not empty, the transmitter takes the first node off the queue and processes it. For each node processed, the transmitter assigns it a sequence number and then visits each child node in turn. If the child has not been visited it is added to the queue. Then the child is marked with the current node's sequence number and current child index value. The transmitter can determine if a node has been visited before because those that have yet to be visited have no sequence number assigned.

Once all the children of a node have been visited, a message is sent describing the node. The message contains the node's data and the number of children it has, allowing the receiver to construct the node in memory. The message also contains the list of marks that have been added to the node. These tell the receiver which nodes need updating to refer to the new node.

Figure 23 shows an example transmission of the tree illustrated in the figure using the process shown in figure 22. The tree is shown annotated with the sequence numbers that are assigned to the nodes and the marks that are associated with them. The marks are a pair of numbers, the first being the sequence number of the parent node and the second the index into that node's children to the pointer to the marked node. Note that node D has two parents, it is a shared common sub-tree of B and F, so has two marks. The root node A has no parents and so has no marks. The transmission involves the following steps: 1. The root node A is queued 2. A is removed from the queue and given the sequence number 1 3. The children of A are visited in turn, B is queued and marked that it is the first child of node A 4. E is queued and marked that it is the second child of A 5. Node A's data is sent, along with an indication it has two children 6. The next node in the queue, B, is taken and given the sequence number 2 7. The children of B are visited in turn, C is queued and marked as the first child of B 8. D is queued and marked as the second child of B 9. Node B's data is sent, along with an indication it has two children and that it is the first child of A 10. The next node in the queue, F, is taken and given the sequence number of 3 11. The children of E are visited in turn. D has already been visited so is not added to the queue again, but it is marked as being the first child of E in addition to the second child of B. 12. Node F's data is sent, along with an indication it has one child and that it is the second child of A 13. The next node in the queue, C, is taken and given the sequence number 4.

14. The children of C are visited in turn, but there are none. So node C's data is sent, along with an indication it has no children and that it is the first child of

B

15. The next node in the queue, D, is taken and given the sequence number 5.

16. The children of D are visited in turn, but there are none. So node D's data is sent along with an indication it has no children and that it is the second child of B and the first child of E The queue is now empty so the transmission is complete Figure 24 shows the steps performed by the receiving computer when receiving data transmitted by the transmitting computer in order to create a copy of the tree being transmitted by the transmitting computer.

The receiver constructs nodes in consecutive memory locations. It receives a message containing a node's data and writes this into the next free location in memory. It then obtains the number of child nodes from the message and appends a NULL pointer to the node data for each one, or otherwise identifies the child pointer references as unpopulated using one of the alternate approaches discussed above.

Next, the receiver visits each of the marks in the message. Each mark refers to one of the child pointers of a previously created node. The mark information is converted to the memory address of the child pointer using a technique described below.

The receiver then checks if the addressed location contains a NULL pointer. If it does not then the mark is invalid, as NULL pointers only appear in the location of a child pointer that has not yet been set. If the addressed location does contain a NULL then the address of the newly constructed node is written to this location.

The receiver then moves on to the next mark. Thus, by having allocated null pointers to un-set child pointers the receiving computer is able to detect certain errors in data transmission.

Figure 25 shows how the receiving computer is able to convert the marks to a memory address, without any searching and without using large amount of memory.

The receiver maintains a queue of node addresses contains the address of each node that is yet to be completely written. The nodes are stored in the queue in the order in which they will be transmitted, which is the same order in which they will be completed.

Logically, the queue is a list of node addresses along with the sequence number of the node that is first in the queue. A mark is converted to an address by first taking the node sequence number from the mark and calculating which address in the queue is that of the referenced node by subtracting the first-in-queue sequence number from the mark sequence number to produce an offset into the queue to the referenced node's address. Next the node's data is examined to determine its length. As the first child pointer follows the node's data, adding the node's start address and size of the node's data together give the address of the first child pointer. The mark's child index is added to this address to yield the address of the referenced child pointer.

The top portion of the figure shows three nodes, the root node is Red and this has two children: Blue and Green. Blue has one child that is yet to be received and Green has two children that are yet to be received. The queue contains the addresses of the two incomplete nodes. When the next node is received this is placed after the Green node and the marks indicate which of the NULL pointers in the Blue and Green nodes need to be updated to refer to the new node.

The lower portion of the figure shows how the mark is converted to the address of the child pointer.

The sequence number of the node whose address is first in the queue is subtracted from the mark's node sequence number to produce an index into the queue to the address of the node referenced by the mark. The memory at this address is read to determine the size of the referenced nodes' data. This size is added to the node's start address and the child pointer index from the mark to produce the address of the child pointer that is referenced by the mark.

To enable efficient memory usage, the queue is implemented by a top level and second level tables. These are shown in figure 26.

The top level table contains a sequence of addresses that refer to second level tables. It has a fixed length that cannot be changed if it becomes full. The sequence may start anywhere in the table and if necessary wraps around to the beginning of the table. This is implemented by noting the location of the first address in the sequence.

The second level tables contain sequences of addresses of nodes. This sequence does not wrap round but need not start at the beginning of the table and need

not fill the table.

Working from the top left in figure 26, the first diagram shows the queue having one item in it, A. There is one second level table. The top level table contains a pointer to the second level table at the beginning. The second level table contains the pointer to A also at its start.

The second diagram shows the state after item B is added to the queue. The pointer to B is stored in the next free place in the last second level table.

The third diagram shows the state after the first pointer, to A, is removed from the front of the queue. This is implemented by incrementing the "first address" offset associated with the second level block so no memory copying is involved.

The fourth diagram shows a pointer to C added to the queue. The second level block is full so a new second level block created to accommodate the new pointer. The address of this is added to the top level block.

Moving now to the second row of diagrams in the figure and starting at the left hand side the fifth diagram shows a pointer to D is added where there is room for this is the last second level block so it is added there.

As shown in the sixth diagram, when the pointer to B is removed from the queue, the first of the second level blocks becomes empty. So this block is discarded and the pointer to it is removed from the top level block by incrementing the "first address". If this reaches the end of the top level block it wraps round to the start of the block.

In the seventh diagram, when a pointer to E is added to the queue there is no room in the last second level block, so a new second level block is allocated. But when the pointer to the new second level block is added to the top level there is no room for it to go on the end so it wraps around and is placed at the beginning.

Figure 27 shows the logic used to map an index into the sequence of nodes to a node address.

The sequence number of the first node in the queue is subtracted from the index to give a relative index. This relative index is divided by the size of the second level tables to produce a whole number and remainder. The whole number is a index into the sequence of second level tables and the remainder is an offset within the selected table. The remainder is the offset within the second level block to the node's address.

The whole number is added to the index of the first address in the top level table to give an offset into the top level table, but this is value is then wrapped around by using a module function which takes the remainder of the value when divided by the length of the top level table. This gives an offset into the first level table to the location of the relevant second level table. The top level table is indexed with this value to produce the address of the second level table.

Next the second level table is accessed to get the index to the first address within it. This value is added to the address of the second eve.1 table and the remainder to produce the node address.

The above embodiments are described byway of example only. Many variations are possible without departing from the invention as defined by the appended claims.

Claims

Claims 1. A method of transmitting a graph structure from a first computer to a second computer using a plurality of messages containing the data of a node of the structure, wherein the second computer constructs nodes in its memory in response to the messages received in such a way that part of a node can be distinguished from the remainder of the node.
2. A method as claimed in claim 1 for transmitting an acyclic graph structure.
3. A method as claimed in either claim I or 2 wherein a tag bit associated with each data location is used to distinguish between one part of a node and another.
4. A method as claimed in either claim 1 or 2 wherein part of a node is distinguished from other parts of the node by containing a special value that is never storcd in other parts of the node.
5. A method as claimed in any preceding claim wherein the plurality of messages additionally contain references to previously transmitted nodes.
6. A method as claimed in any preceding claim wherein the second computer constructs nodes in such as way that the start of a node can be distinguished from the remainder of the node.
7. A method as claimed in claim 6 wherein the second computer validates the references contained in the messages by checking if the location they reference is the start of a node.
8. A method as claimed in claim 7 wherein the second computer rejects a message containing a reference which refers to a location which is not identified as the start of a node.
9. A method as claimed in any preceding claim wherein memory in the second computer is arranged into fixed-sized data blocks containing an area of data values and an area of tag bits fly those data values.
10. A method as claimed in claim 9, wherein memory in the second computer is additionally arranged into one or more overflow blocks that contains just data values, and the data blocks start with a special value that distinguishes them from overflow blocks.
11. A method as claimed in of claims 1 to 4 wherein the plurality of messages additionally contain refcrenccs to child pointer locations in previously transmitted nodes.
12. A method as claimed in claim 11 wherein the second computer constructs nodes in its memory in such as way that child pointer references that have not been populatcd can be distinguished from the remaindcr of the nodc.
13. A method as claimed in claim 12 wherein the second computer validates the rekrences contained in the messages by checking if the location they reference is a child pointer that has not been populated.
14. A method as claimed in claim 13 wherein the second computer rejects a message containing a reference which refers to a location which is not identified as a child pointer which has not been populated.
15. A method of transmitting a graph structure from a first computer to a second computer substantially as herein described with reference to the accompanying drawings.