TREE DIFF
In order to find the difference between two trees, you need only to display them, or with more detail to display their elements in a table whose elements contrast the respective elements. However, in order to give greater meaning to the result, such a table must be synthesized into a higher level language. Such a language should be written in terms of a simple language of Ket operations that map the first tree into the second. This requires that a sufficient range of commands be provided, and that the 'shortest route', i.e. the simplest 'Ket script', be found.
Such a program may be generated using planning software, some concrete algorithm or by stochastically guessing the form of the program from the necessary component operations required to realize the composite affect.
Code already exists to compare and contrast the various tree-element-pairs (element_i, element_j). Next, higher level facts must be determined stating:
1) The uniqueness of a given term;
2) The elimination of a given sub-branch or ancestor-branch.
3) ...
Generally though multiple mappings will take place at once so one branch may be duplicated then further modified twice. To express this as code requires only that the various realizations of a given operation be contrasted and the shortest selected. However, for a great many steps this will prove prohibitively expensive.
There are then various layers of interpretation, or equivalently various levels of abstraction or languages in which the same information is expressed and synthesized.
e.g. for "sin:x" -> "cos:y"
1) table: compare(tree_i \dot tree_j)
sin(x) x
cos(y) {...} {...}
y {...} {...}
2) row, column: row(table), column(table)
{"sin(x)":{...}, "cos(x)":{...}}
3) atomic-diff-set:
move-to 0
rename "cos"
move-to 1
rename "y"
4) atomic-action-set:
5) actual-function-alternatives:
0 s"cos" w s"y"
0 s"cos" w r"y"
0 r"cos:x"
6) selection:
0 r"cos:x"
Here the rename operation has been used, but while it is shortest in terms of string length, in terms of information entropy it is far larger. Exactly how a suitable balance should be found remains unclear but it is however clear that renaming the root branch is often too broad while too many steps quickly loose the reader's attention. Also, the shortest string was selected, but 's' would be chosen over 'r' because it is a more specific command. All of this becomes far more complicated when considering more complicated mappings.
Note that cost should also consider characters which are arguments of the find operation. Because movement is ubiquitous, addresses are perhaps better, but find is almost always a faster operation. However, because it contains an arbitrary character, this increases the associated entropy of the command (though so too would the address of an explicit move-to command). But this character introduces a degree of arbitrariness. Just because the same initial letter starts functions would make finding some variables more expensive.
Another longer example maps from "sin(x)^2+cos(x)^2=1" to "sin(x)/cos(x)=tan(x)" which is clearly a greater jump. At this point, the reader may note that the two strings are effectively 'discontinuous' and this could be reflected either by the 'information waste' associated with a given expression or else just how many steps would be required to transform the expression. Here the result should resemble
fs dp fc dp <space> s"/" f1 r"tan(x)"
but the details are unimportant.
In the next example a very verbose mapping will be explored. Here "(a*x+b)/(c*x+d)=f(x)" would be mapped into "f(x)=(a*x+b)/(c*x+d)+b/(c*x+d)".
The simplest description would note that
c*x+d (1) is duplicated,
f(x) is moved,
...
Digressing, it should be noted that a great deal of redundancy will enter such reports because for every sub-branch that is moved, so too are all other siblings affected. The key to effectively simplifying this problem is to identify and eliminate such redundancy. The problem here though is that either multiple summaries are returned or else the first or shortest sub-branch is reported as being moved which ignores the potential for higher-level symmetry.
The simplest operations include
ff H fb T
But this requires a large vocabulary of commands that will not initially be added. Using a reduced set of commands this would become
...
Note that simplifying specific patterns with a greater range of commands would allow for a given list of commands to be further simplified in an additional step.
This requires only that unambiguous matches be made. That is, a*x+b, c*x+d and f(x) are each patterns that reappears exactly only once. Additionally the '=' is preserved and the '/' operation is duplicated. With this in mind, the