The purpose of the parsing package is to parse a given algebraic expression robustly. The key property of the package is to give meaningful error messages which will hopefully be recovered from intelligently. Furthermore, it is useful to be able to define various ad hoc notation extensions to simplify verbose tasks such as expressing multiplying by powers of ten or taking powers of unitary and prefix functions, e.g. /cos^n 3/. Additionally, this should serve as inspiration for future algebraic regular expression language.
The implementation involves recognizing patterns within a given list of tokens that have already been unambiguously parsed. In order to do so, potential operations are identified: first keeping only those that have required symbols, then keeping only those whose symbols appear in the correct order and in sufficient numbers.
For example, then the expression
|a|>3
would select only those operators that an contain | and >, i.e. abs, norm, gt, ket etc., even though some of these operations cannot be legally constructed.
The next step is to associate indices of the token list with potential functions. To begin with, list indices containing numbers and letters are labelled as such. Next an exhaustive list of all interpretations of each function is generated. Rather than construct an individual object for every combination of indices that a function explains, a 'sub-mask' class is generated that represents unique prefix and postfix indices (or null if they are not defined). In some cases this vastly reduces the number of such objects.
Next the given set of sub-masks are interpreted as being legal or otherwise and are discarded as such. That is, some functions require prefix/infix/postfix symbols as well as a certain numbers of arguments. By recognizing which sub-masks are legal, the collection is further reduced.
Parts of the token list will be unambiguous and the corresponding list indices will unambiguously match. As the sub-masks are organized, such excluded indices will restrict or eliminate other indices. This not only applies to trivial exclusions of symbols that could be associated with other functions, but also those that render illegal another function.
Next, the sub-masks are grouped together into orthogonal groups. An orthogonal group is any grouping of a subset of the sub-masks for which it is possible for no-two indices to require the same list index. This does not however require that it be impossible that an overlap occur. After all, sub-masks ambiguously refer to multiple cells.
An exception may then be thrown if there no subset exists that 'spans' the index list. That is, there exists no orthogonal set of sub-masks that explain every index of the token list.
By eliminating all sub-mask subsets, those that are left may be organized into a hierarchy. This requires that a sub-mask contains arguments that are expressed as ranges within the index list. In order for this to represent a tree structure requires that sub-masks either lie outside of the given range, representing an ancestor branch, or lie completely within the given range.
Token token-list: ........................
Vector<SubMask> indexList: ........................
SubMask unambiguousMatches: 011001010101010101001011
Each sub-mask represents all possible interpretations of a given function within a
1) Identify those functions with symbols in the token list.
2) Label unambiguous matches of numbers, symbols, variables and comments.
3) Create all legal sub-masks
4) Check the legality of sub-masks in more detail
5) Identify unambiguous function sub-masks.
6) Identify subsets based by accreting other sub-masks around unambiguous matches.
7) Eliminate mask-groups that do not span the list.
8) Of those spanning groups, seek a hierarchy.
9) Attempt to instantiate every possible legal match.
A) Create the corresponding data structure(s).