GB2384880A

GB2384880A - A method for associating help information with a source text or video

Info

Publication number: GB2384880A
Application number: GB0202480A
Authority: GB
Inventors: Susan Margaret Mccaig; Alexander Moir
Original assignee: Individual
Current assignee: Individual
Priority date: 2002-02-02
Filing date: 2002-02-02
Publication date: 2003-08-06
Also published as: GB0202480D0

Abstract

A dog lead consisting of a length of elastic which may stretch under a load but which is limited to a finite amount of extension by a non-elastic material. The elastic component may pass through the non-elastic component at right angles by way of a series of holes made in the non-elastic component. The elastic material, which is preferably either bungee cord or shock cord, may feature end stops which are attached to the non-elastic material at the limits of the elastic cord. The ends of the non-elastic component may be provided with either trigger hooks or metal or plastic rings and may be formed of webbing. The lead may be provided with luminescent or fluorescent material or some form of lights or led's.

Description

1 2384880

A Method for Associating Help Information with a Source Text or Video Background to the Invention

A method of improving one's understanding of a foreign language is to try to read books, magazines and other quality textual material in the language and refer to a language dictionary, thesaurus or language grammar for unfamiliar words and constructions in order to team them in the context of their use. The process of trying to learn or improve understanding of a foreign language by this means can be slow and difficult, however. An advantage is to have a foreign language native speaker on hand for assistance when needed. This not only shortcuts the process of looking up unfamiliar words and phrases but also allows the assistant to provide explanations of the text beyond its literal meaning, e.g. the meaning of idioms, metaphors and slang. Even better is a foreign language expert who is not only a native speaker but knows everything about the proper and everyday use of the language.

The genesis of this Invention was the idea of providing an automatic foreign language expert to give help to a reader at an electronic screen with a foreign language text as the reader requests. This automatic foreign language expert could be implemented in computer hardware or software or some combination, either as part of an existing product or service or in some special packaged product for the purpose. The underlying method of the Invention proved, however, to be applicable to a broader range of applications than just foreign language assistance: help can be provided for texts other than foreign language texts, such as literary criticism and don ain-specific language; help can be provided in formats other than text (e.g. audio output for the purpose of learning pronunciation); the method can be applied to offer help with video displayed on a screen rather than text, etc.

Overview of the Invention The Invention is for use with textual or digital video content that is, or can be, delivered electronically and is subjected to an editing process. Typically this is 'valuable' content such as published books, electronic manuals, and news stories supplied over news wires, video programmer, and so on. The Invention requires content to be 'marked up' in an editing process in order to identify the relevant portions of it for which help (e.g. a translation) may be provided.

When a reader requests help at some point in a source text, the mark-up at that point is used to determine the type of help required, e.g. the translation of a phrase or sentence. The help is obtained by executing a help functional expression, the choice of which is determined by the markup. In the simplest case this help expression is a look-up function, which looks up the help information (the translation) in a help file and displays it to the user. The process that embodies this basic method of mark-up (of a particular source text or type of source text) that leads to the execution of a help expression that returns help information for output is referred to as an Automated Text Expert.

In addition a General Automated Text Expert (abbreviated GATE), typically implemented in software, is described that is a 'shell' for Automated Text Experts. By supplying the GATE with parameters including the definitions of the mark-up to be used, corresponding labels for use in help menus, the definitions of help expressions to be executed for each different type of mark-up, and pointers to the relevant look-up files, the GATE becomes an Automated Text Expert for a particular source text or class of source texts and the help required without additional hardware or software development. If additional special help functions are required then these can be defined to the GATE to become part of the Automated Text Expert for that particular application.

Analogous to the method and processes for text, a similar method and processes, including a General Automated Video Expert process, is described for use with digital video sources that are marked up for the purpose of providing help information with them.

In relation to foreign language applications in particular, this Invention is very different from othe generic methods that exist already. Sophisticated machine translation methods have been invented to help translate foreign language texts and these methods can be applied to ad hoc pieces of text typed in by a user. An example is Babel Fish (http://babelfish.altavista.corn/). Conversely, machine translation methods depend upon natural language analysis, which is often imperfect when confronted with non-

literal text (such as idioms), and on the generation of translation text, which is generic and usually not as good as an expert human translator can produce in terms of subtlety of meaning, sophistication of use of language and style appropriate to the intention of the author. This Invention does not have these limitations since * is possible for it to provide high quality translations of a published text that has been marked up appropriately for the purpose.

Description of the Invention

Introduction

The Invention is embodied in a software process that allows the association of help information with a source text, which is to be read on an electronic screen, by the use of mark-up with special 'text tags' that are invisible during display of the source text to the reader but which serve to delimit portions of the source text.

Later, a variation is described in which the mark-up of the source text is maintained in a file separate from the source text file obviating the need for any insertions of tags into the source text itself. The examples below are presented in terms of mark-up inserted into the source text, however, because that presentation makes the description of the method easier to understand.

Later a variation is described in which the method is applied to video viewed at an electronic screen.

The examples below are presented in terms of mark-up inserted into a source text, however, because that presentation makes the description of the method easier to understand.

The process of associating help information with source text and providing help to a reader at an electronic screen is referred to as the Automated Text Expert. The total set of help information for any particular application is called the Help Infonnation Corpus.

A particular application of the Automated Text Expert is help with foreign language understanding and foreign language learning in which the source text is in a foreign language and the help information is in the native language of the reader. Many of the examples in this document are examples of help with foreign language understanding or with foreign language learning but the association of help information with source text using the Invention has many other applications: for example associating a commentary with a source text (e.g. a literary commentary on a play or a poem) that the reader can call up on demand, providing explanations of the specialist terms or phrases that occur in a specialist source text (e.g. a technical manual) that the reader can call up on demand, and so on.

Figure I illustrates 5 basic steps in providing a reader with help with a source text, and shows a Before' and 'After' Screen.

1. The text has been marked-up with 'tags' to delimit portions of it. The text is displayed to the reader on an electronic screen of some sort without the mark-up tags.

2. At some point during reading, the reader requests help using some input device, e.g. clicking the mouse over a particular character of the source text on the screen.

3. Using the marked-up version of the text, the Automated Text Expert is able to determine to which portion(s) of the source text the reader is referring for help and how these portions have been marked up.

4. The Automated Text Expert now executes a Help Function (help functional expression) which consists of retrieving help information, usually from the Help Information Corpus, as determined by the tag used for mark-up and the particular portion of (marked up) source text referred to by the reader. 5. The help information is output to the reader.

Examples

Figure 2 illustrates the method and process of providing help with understanding or learning a foreign language. In this example the source text is in English and the help information is in French. The choice of mark-up symbols ('[P[', '[V[', '[N[', etc.) is for illustration only is not prescribed by the Invention. While reading the text '... SHE WAS OVER THE MOON AT HIS PROPOSAL...' the reader requests help by, say, pointing and clicking the mouse over the text 'THE'. The Automated Text Expert cannot determine in this case whether the reader wants help with the word 'THE', the noun 'THE MOON' or the idiom 'OVER THE MOON'. The reader chooses help with the idiom 'OVER THE MOON' and the Automated Text Expert performs a look-up of the idiom 'OVER THE MOON' in the Help Information Corpus file for idioms in order to find French help information for this idiom. The text 'OVER THE MOON' may, of course, not be being used idiomatically elsewhere in the same source text as in 'THE ROCKET SHOT INTO THE NIGHT SKY AND DURING ITS DESCENT APPEARED TO FLY OVER THE MOON'. In this case the text 'OVER THE MOON' is not marked up as an idiom.

Figure 3 is an outline flowchart of the underlying process of the Automated Text Expert for the example in Figure 2. The Automated Text Expert uses 'Associative Look-Up' in which the help infonnation is associated with the marked up source text in a Help Information Corpus file (rather like a word and its deDmition are associated in a dictionary) . Every occurrence of the idiom 'OVER THE MOON' in the source text will result in the same piece of French help information. The organization of the Help Information Corpus and corresponding operation of the look-up function may be performed by any suitable and well-understood computer method, e.g. sequential search, binary search, index sequential search or random access search, depending upon the size of the Help Information Corpus file and perfonnance requirements of the function.

Figure 4 shows another example of another type of mark-up, Help Information Corpus file and help function. In this example, a source text in English has been 'synchronized' with a translation of it into French. Typically, an expert human translator will have created this translation. The example shows that the translator has chosen to translate one English sentence into 2 French sentences. In general, there may be a 'many-tomany' correspondence between portions of source text and portions of help information (the translation). In this example the unit of translation is a single sentence in the English source text, but it need not be.

Figure 5 shows an outline flowchart of the underlying process of the Automated Text Expert corresponding to the example in Figure 4. In this example every marked up portion of source text has an associated number (or, equivalently, identifier of an enumerated type). The number is used to identify the corresponding piece of help information in the Help Information Corpus file containing the translation. Typically, therefore, the Help Information Corpus file would be organised as an index sequential file in which the nth element in the index points to the help information corresponding to the nth portion of source text; the help function is an 'Indexed Look-Up' of the help information (translation).

Mark-Up The method of the Invention requires that source text be marked up so that the Automated Text Expert can distinguish those portions of it for which help may be provided. Not all of a particular source text needs to be marked up. (Source text may be marked up by inserting tags into it, as in the examples above, or by preparing a separate mark-up file containing tag information, as described in the next section.) The Invention does not define or prescribe the exact characters, symbols or conventions to be used for mark-up tags. These may vary from application to application. Different mark-up 'standards' for different classes of application may be created from time-to-time.

Some applications may require one type of tag only (e.g. to separate one portion of text from the next).

More generally, however, multiple tag types are required for different purposes (e.g. a sentence separator, a paragraph separator, a chapter separator). Therefore, a tag will contain some 'tag text' to distinguish it from other tags (e.g. the tag text 'N' in the tag '[N[' in Figure 2 distinguishes the noun tag from the verb tag '[V[' which has tag text 'V') . The method does not define or prescribe the form or content of tag text (but see the description of the General Automated Text Expert below for tag

conventions which it uses).

Some applications may require the use of start and end tags rather that just separator tags to delimit a portion of text (e.g. to mark the start and end of an idiom). Separator tags are defined to be both start and end tags simultaneously except for the first separator in a source text, which is a start tag, and the last separator, which is an end tag.

On some applications start and end tags can be the same (e.g. '/II' to mark both the beginning and end of an idiom). Since start and end tags are always paired it is always possible to distinguish a start tag from an end tag which has the same representation by counting from the beginning of the source text or other delimited portion of surrounding text that provides a context. For example, in the marked up source text: [SENTENCE[.../PHRASE/ A /PHRASE/ some text /PHRASE/ B /PHRASE/]SENTENCE].. .

the '[SENTENCEE' tags provide a context for interpreting the '/PHRASE/' tags in order to distinguish phrases A and B. There is no danger of misinterpreting 'some text' as a marked up phrase even though it is surrounded by /PHRASE/ tags. In general, however, it is more convenient to have different representations for start and end tags and this also permits marked up portions of text to be nested as in, for example:

[IF[if we had some eggs then [IF[we could have same ham 'n' eggs if we had some ham.]IF]]IF] in which, for illustration, start tags have the form '[tag text[' (e.g. '[IF[') and end tags have the form ]tag text]' (e. g. ']IF]'). Although start and end tags are paired, they do not have to have the same tag text. By suitable choice of tag text, marked up text may overlap but not be nested as in the following example:

[IF1 [if we had some eggs then [IF2[we could have same ham 'n' eggs]IFl] if we had some ham.]IF2] in which the start tags have the form '[IF1 [' and '[IF2[' so that they can be distinguished and matched with their corresponding end tags. Therefore, there are no restrictions on the nesting or overlap of marked up source text. Source text may be subject to multiple independent mark-up 'standards', conventions' or 'regimes' at once. For example, the sentences, phrases, nouns, verbs and other parts of speech in a source text may be marked up according to one convention and the chapters, sections, pages and lines may be marked up simultaneously according to another convention. The only requirement is that the set of tag texts in these two independent conventions do not 'clash'.

The choice of mark-up may depend not only on content but also on position, context or whatever other criteria the (human or automatic) process marking up the source text decides are appropriate to the purpose.

Identifying Marked Up Portions of Source Text When a reader requests help by pointing at some position in the source text with some input device, the Automated Text Expert preferably determines all of the marked up portions of the source text that overlap the position. This would require that it find the tags surrounding all marked up portions of text at that position.

(A minor problem is that tag characters may themselves appear in the source text. The usual solution to this problem is to choose an escape character, e.g. '/', such that when placed in front of another character the escape character causes the other character to be interpreted literally rather than as the start of a tag. For example, '/[' means the character '[' and is not to be interpreted as the start of a tag.

Also, '11' is equivalent to '/').

A simple algorithm is to scan the whole source text from the beginning to find all matched pairs of start and end tags (treating all separator tags except the first and last as both start and end tags) until the help position is reached. All portions of text that have had start but not yet corresponding end tags at the help position are eligible for help. It simply remains for the Automated Text Expert to continue scanning through the source to find the end tags of those portions of text. Unfortunately, for large source texts this is very inefficient.

It is possible to improve performance by keeping track of marked up text as the reader scrolls through the source. This means that it is not necessary to return to the beginrung of the source every time the reader requests help but to a known position that the reader has already read passed. Unfortunately, this algorithm cannot be used if the reader 'jumps about' in the source text.

It is possible to combine both algorithms, i.e. scan from the beginning of the source text when the reader jumps to a new position but continue to track marked up text thereafter as the reader continues to scroll forward through the text.

It is also possible to design mark-up and a scanning algorithm that makes use of knowledge about the mark-up for any particular application. For example, if the whole of the source text is marked up with start and end paragraph delimiters and paragraphs are not nested in any other mark-up then, when the reader jumps to a new position, the Automated Text Expert merely needs to scan back to the start of the surrounding paragraph to establish a new context. If there is no suitable tag type for this purpose in a particular application then it is possible to insert 'dummy' separator tags throughout the source, not nested in any other mark-up, so that the Automated Text Expert can scan back from the help position to the previous dummy tag to establish a new 'mark-up free' context from which to start scanning forward for tags.

An alternative technique (proposed for use in the General Automated Text Expert described below) is to create a separate 'mark-up file' for each source text: Each character in a source text is defined to be at some position. Position O occurs before the first character and position 'n' (n not equal to O) occurs after the 'nth' character from the start of the text.

Mark-up occurs at some position in the source text but does not change the position of characters in the source text. The mark-up file represents a monotonically increasing list of character positions in the source text. Each record of the file describes the n ark-up in force at that position. It is not necessary for the mark-up file to list every position but only those positions at which the mark-up changes, i.e. positions that have start or end tags.

Figure 6 is a diagram of a 'mark-up tuple', a data structure that records the mark-up that covers some portion of the source text in terms of start and end positions. Figure 7 is a diagram of a mark-up file corresponding to the piece of marked up text shown.

Each portion of marked up source text is represented by a mark-up tuple that records its start and end positions. For example, the text marked up as 'A' starts at position al and ends at position a2. By convention, each entire source text is marked up with a dummy source start tag at position O and a dummy source end tag at the last position of the source text. This tag is denoted by the tag text 'S' in the diagram, at positions sl and s2 respectively.

Since multiple start and end tags may occur at a single position a sequence of records in the mark-up file may all be for the same position, each recording, successively, a change in the mark-up. (Note that there may be no source text between a start and end tag. This is called dummy mark-up. The mark-up file still records the mark-up in successive records at the mark-up position, however. The reader can never ask for help for such mark-up since the reader always has to point at some character in the source text and not to a position between characters in order to request help. Dummy mark-up can be used by Externally Defined Help Functions, which are explained in the description of the GATE below.)

Construction of a mark-up file from a marked up piece of source text is a straightforward algorithmic process. During a forward scan of the text, each start tag results in the creation of a new file record for the position with, initially, a linked list of tuples which is an identical copy of the list in the previous record. A new tuple representing the new start tag is added to the front of this list with its start position and tag text completed. (The end position in the tuple of this mark-up is filled in when the corresponding end tag is encountered.) When an end tag is encountered a new record for this position is created in the file with, initially, a tuple list identical to the previous record. The copy of the tuple list is searched for the tuple representing the mark-up that has ended with this end tag and this tuple is removed from the list. Before removal, however, all previous copies of this tuple have their end positions completed. Previous copies can be found by linking them together or by scanning record tuple lists from the start position of this particular mark-up onwards, i.e. from the position at which this tuple first appeared in some list.

An alternative method is to create the mark-up file dynamically during the editing process as each piece of mark-up is added (see the section Marking Up Text below).

With the use of a mark-up file it is possible to remove the mark-up tags from the source text. There is one proviso. Notice that in Figure 7 each start tag that is encountered in a forward scan of the source results in a new linked list of tuples for that position including a tuple representing the new start tag.

But each end tag encountered results in a new linked list with the appropriate tuple deleted (since from that position onwards that particular mark-up is no longer in force). If tags are physically removed from the source then although the position of mark-up is known and the tag text of start tags is captured in the mark-up file tuples, the tag text of end tags is lost. The solution is either to record the tag text of both start and end tags in the tuples representing mark-up or to agree a convention that end tags add no new infonnation to the mark-up but merely signify the end of a particular type of mark-up.

Knowing the length of the source text, the Automated Text Expert is always able to keep track of the position in the source text of the first character appearing on the display screen as the reader scrolls forward or backwards Trough the text or chooses to jump to a new relative position in the text.

Therefore, when the reader requests help by pointing at some position on the display screen, this position corresponds to a known position in the source text. The Automated Text Expert searches through the list of positions in the mark-up file in order to determine the mark-up in force at that position. Since this is a monotonically increasing list of numerical positions it is easy to make this search fast and efficient using standard computer techniques such as index sequential search.

Ir

Basic Help Functions and Help Infonnation Corpus Files The help information that is provided for each portion of marked up source text is usually obtained by a standard look-up or replacement algorithm as shown in the examples above. More formally, the help function can be represented as a family of mapping functions 'h' of three parameters: h(tag text, marked up source text, Help Information Corpus) = help information To simplify the description and without loss of generality assume that only the tag text in the start tag is

significant, i.e. that end tags contain no additional information. In general the tag text defines the precise function to be executed and either or both of the marked up source text and Help Information Corpus are present only if the function thus defined requires them The Basic Help Functions are a subset of the mapping functions h and, for convenience, a more natural and convenient notation is employed: Associative Look-Up Help Function For example:

Associative Look-Up('OVER THE MOON', Help Information Corpus for idioms)= ECSTATIQUE, LITERALEMENT AU DESSUS DE LA LUNE'

The function is an 'Associative Look-Up' of 'OVER THE MOON' in the Help Information Corpus for idioms. (The choice of Associative Look-Up and the particular file of the corpus to use (the idioms file) is determined by the tag text 'I'.) The Associative Look-Up takes two pointer parameters in the example a pointer to a character string containing the marked up source text 'OVER THE MOON' and a pointer to the file of idioms in the corpus.

Figure 8a shows that each record in the file of idioms consists of two pointer fields. The first

field points to the text of an idiom. The second field in the records of an associative file

normally points to the associated help information' The function searches for the idiom in the strings pointed to by the first field of the records and returns the associated pointer in the

second field when a match is found. The pointer in the second field need not be a pointer to

help information; the Associated Look-Up function does not interpret it, it merely returns it to the calling function or procedure.

Indexed Look-Up Help Function The following example, has no marked up source text parameter: Indexed Look-Up(2, Help Information Corpus for translated units)= 'Jamais quiconque n'aurait pas imagine qu'ils puissent se trouver impliques darts quoi que ce soil d'etrange ou de mysterieux. Its n'avaient pas de temp a perdre avec des sornettes.' The choice of an Indexed Look-Up help function and the Help Information Corpus file for translated units is determined by the tag text part 'T'.

Figure 8b shows his Help Information Corpus file for translated unites. It is a straightforward list of pointers. Using an index value of 2, the function returns the second element in the index (which, in this case is a pointer to the translation). The pointer need not be a pointer to help information; the Indexed Look-Up function does not interpret it, it merely returns it to the calling function or procedure.

The Associative Look-Up and Indexed Look-Up files and functions are analogous to the two main types of computer storage and operations on computer storage, namely (1) associative or content addressable storage and (2) random access storage. When called by the Automated Text Expert, the pointer that each of these functions returns is interpreted as a pointer to a character string of help

to information, which is then displayed to the reader. But the functions can be called in other contexts in which the pointer returned is interpreted differently. An example is given at the end of this section.

In Line Help Function It is possible to have a null Help Information Corpus parameter. Consider, for example, the marked up text: [Q. Quotation from Hamlet[To be or not to be, that is the question.]Q, Quotation from Hamlet] Formally, the help function is: h('[Q, Quotation from Hamlet[', null, null)= 'Quotation from Hamlet.' In other words, in this example the 'Q' tag has the help information 'in-line' in the tag text rather than in a separate Help Information Corpus file, obviating the need for any look-up.

(This function would normally be written as: In Line('Quotation from Hamlet') i.e. the 'Q' part of the tag text selects the 'In Line' function which has null marked up source text and Help Information Corpus file parameters.) Void Help Function The help function for mark-up can bedeemed as 'void' meaning that there is no help for this particular piece of marked up source text. This is useful for def ung the effect of dutnmy tags.

The Automated Text Expert does not offer the reader help for marked up text whose help function is defined as Void.

Because help functions return pointers without interpreting their types (like machine code subroutines), the help function can be some combination of Associative Look-Up, Indexed Look-up and In-Line functions in an order defined by the particular tag text. Consider, for example, the following marked-up text: [Tr, translate[To be or not to be, that is the question.]Tr]...

and help expression: Indexed Look-Up (translate, Associative Look-Up('To be or not to be, that is the question.', Help Information Corpus for Tr)) = 'Etre ou ne pas etre, c'est la question.' And the marked-up text: [Tr, explain[To be or not to be, that is the question.]Tr]...

and help expression: Indexed Look-Up(explain, Associative Look-Up('To be or not to be, that is the question.', Help Information Corpus for Tr))= 'La citation: ' etre ou ne pas etre, c'est la question.' est de hameau de jeu de Shakespear.' Figure 8c shows the Associative Look-Up of the source text 'To be or not to be, that is the question', instead of leading directly to help information, leads to an index of two pieces of help information. The first (signified by '[Tr, translate[') is a straightforward translation and the second (signaled by [Tr,explain[') is an explanation, i.e. the index is parameterised by the enumeration type: (translate, explain)

i (i.e. the identifier 'translate' is equivalent to 1 and 'explain' is equivalent to 2). It would also have been possible in this example to organise the Help Information Corpus file as an index of two associative files, one for translations and one for explanations and perform an Associative Look-Up on whichever was selected.

The Basic Help Functions described here produce textual output (ultimately). Help may be provided in other formats, however. For example, the help information night be the digitised voice of a foreign language native speaker speaking the marked-up text in order to help the reader learn the pronunciation of the language. Or the help information may be the digitised image of a photograph of a person or a place or a map or other diagram, etc. An individual implementation of the Automated Text Expert may support a number of different output formats. Because the Associative Look-Up and Indexed Look-Up functions return pointers only, these same algorithms may be used for different output formats. It is up to the Automated Text Expert to anticipate the output format returned for each different type of rnark-

up (tag text).

Marking-Up Text The method of the Invention depends upon source text being marked-up appropriately either with tags physically included in the text or by means of a separate mark-up file that defines the mark-up and its position within the source text. The method does not prescribe the way in which source text is marked up since this may be an automatic, manual or semi-automatic process. As an illustration the following method of marking up source text is envisaged as typical: À For each Help Information Corpus file that is used for Associative Look-Up, the source text is scanned by automatic computer process to identify elements in the file that may be marked up, e.g. Nouns, Verbs, Idioms, etc. Subsequently, a manual scan by an editor adjusts the mark-up (and possibly Help Information Corpus file) and corrects mistakes (e.g. identifies non-

idiornatic uses of 'OVER THE MOON' and removes the mark-up from them).

Alternatively, a semi-automatic process is possible that, like a spell checker in a word processor, takes the editor through the text suggesting mark-up, which the editor may accept or reject. New marked-up source text inserted at some point by the editor may cause the process to continue from the beginning once the end has been reached until the whole text has been checked.

For each Help Information Corpus file that is used for Indexed Look-Up, the source text is scanned by automatic computer process in order to create an index of associations between marked up source text and Help Information Corpus file. It may or may not be possible to automate or even semi-automate this process depending on the nature of the source text and the desired mark-up. For example, imagine a book has been translated from English into French: A semi-automatic mark-up process works on the principle that the chapters of the book and its translation map one-for-one. The process assumes that the paragraphs within chapters in the book coincide with paragraphs within chapters of the translation one-for-one. The process assumes that the sentences within paragraphs in the book coincide with sentences within paragraphs in the translation onefor-one. The book and its translation are displayed side by side on a screen for a human editor to review. Sentence by sentence the process suggests a mapping of a book sentence to a translation sentence. The editor accepts the suggestion, with a single key stroke say, in which case the sentence and its translation are marked-up automatically and the process moves to the next pair of sentences, or the editor adjusts the mark-up by hand on the screen to capture the correspondence between book text and translation text. Subsequently, when a sentence boundary is again reached in both the book and its translation, the process of suggesting correspondences resumes, and this editing process continues until the whole book has been marked up.

It is also possible to mark-up text and synchronise it with its translation during its translation and human translators can be offered similar simple computer assistance for this purpose, i.e. mark-up can be made an automatic side-effect of some other editing or processing function.

In many current applications, text is processed for a variety of reasons, often using sophisticated algorithms, e.g. to assign topic metadata in order to assist search and retrieval, to perform entity recognition of people, places and other things in the text, to perform natural language processing, and so on. These algorithms can be augmented to mark-up source text in the course of their operations in order to associate help information with relevant portions of it. A separate manual scan of the source text may be performed to insert In-Line help information, to insert dummy tags at appropriate positions in the source if required, and to create any other mark-up for which there is no simple automatic process to provide assistance.

À The various different marked up texts produced by the automatic, semiautomatic and manual processes described above are merged to produce the final marked-up source text. This may be in the form of tags inserted into the source text or in the form of a separate mark-up file. At that point, the mark-up is also checked to be 'well-formed', i.e. that every tag is properly

written, that start tags have matching end-tags, that tags from different mark-up processes do not clash, etc. Automatic processes should produce well-formed mark-up but subsequent manual adjustments and merging may introduce mark-up errors. These are picked up at this final stage during which a human mark-up expert is given the opportunity to make adjustments and corrections to any erroneous mark-up detected.

General Automated Text Expert (GATE) In the sections above describing Mark-up, Identifying Marked Up Portions of Source Text and the Basic Help Functions and Help Information Corpus Files, the Automated Text Expert was described as an application specific process that uses the Invention method of associating help information with a source text. It is also possible to create a General Automated Text Expert (abbreviated GATE) that can be used for many help applications, i.e. a 'shell' process that, by means of suitable parameterisation, can become an Automated Text Expert for a particular source text and help purpose or class of source texts and help purpose. An example of how to construct a General Automated Text Expert is described here in order to illustrate the principle. Different conventions and data structures from those chosen here for illustration could be used to achieve the same purpose without altering the principle: A set of conventions is adopted to define operational parameters to the GATE. Firstly, a convention is defined for the representation of tag text for use by the GATE. For example, assume the GATE uses a mark-up file instead of expecting tags to be inserted into the source. Further, assume that the tag text has the form: tag identifier, comma separated list of tag parameters and that tag identifiers are unique, i.e. that there is no clash even of tag identifiers (rather than tag text as a whole).

Figure 9 shows an example form for individual tuples in the mark-up file describing a marked up portion of the source text. Instead of pointing to the tag text, field 3 of the tuple points to a linked list of

items. The first item in this list points to the tag identifier text stony, the second item points to the first tag text parameter string, the third item to the second tag text parameter stung, and so on. The tag identifiers to be used for the application are defined to the GATE.

Labels for the help menu displayed to the reader are assigned for each tag identifier (e.g. 'Idiom' for I', 'Noun' for'N', etc.) and defined to the GATE.

Each tag identifier has an associated help expression defined for it. When a reader requests help for a particular marked up portion of the source text (described by a tuple in the mark-up file), the GATE tuple evaluator' function executes a corresponding help expression. Help expressions are functional expressions made up of the Basic Help Functions and possibly Extemally Defined Help Functions: (a) A Basic Help Function is either the Associative Look-Up, Indexed Look-Up or In Line help function described above. For each look-up function it is necessary to supply to the GATE the appropriate part of the Help Information Corpus organised as a file structure for Associated or Indexed Look-Up as required.

(b) An Externally Defined Help Function is an optional computer function defined by an external provider designing the help application, usually written in some standard computer programming language and compiled or linked into the GATE for the purpose of providing help information not available using the Basic Help Functions.

Externally Defined Help Functions can have side effects, e.g. to remember 'state' information from one function call to the next. Therefore, the GATE calls a function 'Initialise()' at the start of its operation, and a function 'Temlinate()' just before the end of its operation. These functions have no effect unless Externally Defined Help Functions with the same names are defined to the GATE that replace them. Externally Defined Initialise() and Ternunate() Help Functions must return void and can initialise state variables at the start, and process the cumulative contents of state variables at the end, of the GATE's operation respectively.

Typically an Externally Defined Help Function will interpret the tag identifier and tag parameters in order to determine what action to take. The text of parameters may be interpreted in any way that the provider of the function detemunes, e.g. the text string parameter '20022002' could be interpreted as '20 February 2002' by a function.

There are no restrictions on Externally Defined Help Functions except that if called at the Outermost level of a help expression by the GATE then they must return a pointer to a (possibly empty) character string (which the GATE interprets as textual help information). As with Basic Help Functions, however, a number of different Externally Defined Help Functions may be defined for different output types.

In defining the help expression to be evaluated for each tag identifier the following notation convention is used: %S represents a pointer to a string containing the marked up portion of the source text in the source. %I represents a pointer to the tag identifier (from the list of tag text parameters in the tuple).

%1 represents a pointer to the first tag parameter (from the list of tag text parameters).

%2 represents a pointer to the first tag parameter (from the list of tag text parameters).

Etc. help infonnation corpus:%I represents a pointer to the Help Information Corpus file for tag identifier %I %T represents a pointer to the mark-up file record representing the position of the start tag.

Examples:

1. For the tag identifier 'I' (Idiom), the help expression is defined to the GATE as: Associative Look-Up(%S, help information corpus:I) Figure 10a shows the tuple for the idiom resulting from scanning the source text: [I[OVER THE MOON]I]...

If the reader requests help with this idiom then the GATE executes the assigned expression with parameters filled in appropriately, e.g. Associative Look-Up('OVER THE MOON', help information corpus:I) where 'help information corpus:I' is a pointer to the Help Information Corpus file supplied to the GATE for idioms.

2. As a second slightly more complicated example, consider the following more complex help expression for tag identifier 'Tr': Indexed Look-Up(%l, Associative Look-Up(%S, help information corpus:Tr)) When called by the GATE for the marked up text: [Tr, explain[To be or not to be, that is the question.]Tr]...

the tuple for this quotation is constructed as shown in Figure 10b and the expression is completed as: Indexed Look-Up(explain, Associative LookUp('To be or not to be, that is the question', help information corpus:Tr) ) As explained above for Figure 8c, the inner Associative Look-Up looks up the quotation in the help information file provided to the GATE and pointed to by 'help information corpus:Tr'.

This look-up delivers a pointer to an index file with two entries. Then the outer Indexed Look-

Up looks up the 'explain' item in this resulting file and returns the explanation:

La citation: 'etre ou ne pas etre, c'est la question.' est de hameau de jeu de Shakespear.' 3. Consider the following mark-up: [Sentence[[Stock, MKS.L[Marks & Spencer]Stock] reported bumper sales over the 2001 Christmas period.]]Sentence] and help expression for the 'Stock' tag identifier: Get Trade(%l) The Externally Defined Help Function 'Get Trade' returns a pointer to a stock exchange price for the stock symbol passed to it as tag text parameter %1, e.g. 1!1 _ 1

|I |IMARKS SPENCER]| London || |,1364.50 g 10.0 2.82%| 11 825,181|1 Note that the Get Trade function is deemed to the GATE as retunung a pointer to a different output type, a 'fomlatted table' rather than a character string, for the GATE to display.

4. The %T parameter gives an Extemally Defined Help Function a pointer to the mark-up file record created for the position of the start tag in the source text. From this the function can find the tuple representing this marked up portion of text. From the tuple the function can find the tag parameters and marked up source text. It can also find the context in which the mark up occurred since the following tuples in the tuple list of the record describe this mark-up context. Also, since it knows the start and end positions of the mark-up from the start tag tuple it is able to find any embedded mark-up by examining the records in the mark-up file between those two positions. This enables an Externally Defined Help Function to determine the complete context in which the reader has requested help if necessary. Consider, for example, the following markup: [L[[T,n[[LV[Help]LV], [LP[I]LP] [LV[am]LV] [LV[held]LV1 [LN[prisoner] LN].]T,n]]L] but imagine it in the fomm of a mark-up file without physical mark-up tags in the text, So the text reads: Help, I am held prisoner.

The T tag identifier results in a translation of the source text by Indexed Look-Up of sentence n' as described earlier and produces, say: Aide, je suds prisonnier tenu.' Imagine that the L tag has a help menu label 'Translate Literally' and an associated help expression: Literal( /OT) where 'Literal' is an Externally Defined Help Function. The function produces the slightly different literal translation: Aide, je suds tenu prisonnier' by translating each marked up word in the sentence and joining the result together using the same punctuation. Literal does this by finding each embedded piece of mark-up for which a literal translation is available (the tags LV, LP and LN) and for each makes a recursive call to the GATE 'tuple evaluator' function. This function determines and executes the help expression for each tag identifier and, in this example, returns the character string translation of each word by Associated Look-Up in a language dictionary file.

This illustrates that the programming of Externally Defined Help Functions may make use of knowledge of the structure of mark-up of a particular source text or class of help application.

Although the %T parameter appears to offer great flexibility to Externally Defined Help Functions the static context and structure of mark-up is known at editing time when source text is marked up and it may be easier to provide the help desired by careful adjustment of the mark- up and preparation of help infonnation files rather than by developing sophisticated Externally Defined Help Functions. So, in the example above, Iwo Indexed Look-Up files, one for proper translations and one for literal translations ('L,n') could have been prepared and the help expression for the tag identifier L' could have been defined simply as: Indexed Look-Up('n', help information corpus:L) The %T parameter can also be used to determine the dynamic context in which help information is to be provided. For example, help information for the Sentence about Marks & Spencer in example 3 above might read either: Despite reported bumper sales over the 2001 Christmas period the company's share price fell' or Because of reported bumper sales over the 2001 Christmas period the company's share price rose' depending upon the direction of price movement reported for the stock by the Get Trade function. Dummy mark-up allows information to be obtained dynamically without supplying this information to the reader.

For many applications the Basic Help Functions are adequate for the purpose. This makes the GATE a generally useful help information process even without the addition of Externally Defined Help Functions.

Advantages of the Invention over Automatic Text Processing or Recognition Methods and Systems and over Machine Translation Systems Although the Invention has many applications, it has particular application to foreign language understanding and learning. In recent years there has been much research into automatic text processing or recogr,ition for a variety of purposes including natural language processing and machine translation of foreign languages. This section compares the Invention with automatic text processing or recognition methods, in particular by comparing the Invention's use for foreign language understanding and learning with machine translation systems that depend upon automatic text processing or recognition methods.

In order to provide a translation, machine translation systems have to analyse the source text in order to automatically 'recognise' words, phrases, parts of speech, etc. for which a translation is to be constructed. In order to deal with idioms, slang (for example, 'butcher's' for 'look' in Cockney rhyming slag), etc. these systems have to be programmed with complex 'rules' in order for them to recognise when a word or phrase is not being used literally. Even so, many machine translation systems would not be able to identify, for example, idiomatic and non-idiomatic uses of the same phrases, as in the 'OVER THE MOON' example earlier, with complete accuracy. Since there is no end to the inventiveness of poets and novelists in the use of language metaphorically, the recognition of metaphors poses a particular problem for machine translation systems. This Invention can provide a good translation that suffers from none of these limitations because it does not depend on any automatic text processing or recognition technique from which the translation is deduced.

In this Invention the content of the source text is only one possible input in determining the help information generated and it may not even be relevant to the derivation of help information in some cases, e.g. as in the Indexed Look-Up in the example in Figure 4. This is totally different from the approach taken in systems that depend upon automatic text processing or recognition as the basis for deriving output.

Machine translation systems can attempt a translation of ad hoc text, for example a phrase or sentence typed into the system by a user. This Invention is not intendedfor this type of application. This Invention depends upon a pre-existing source text, for example a published work of literature or a story on a news wire, being marked up appropriately with the provision of help information f les and help functions specific to the help purpose.

Human experts may adjust and change the mark-up of a source text, help information and functions for any help purposes they choose and these decisions may vary so greatly and be so qualitative that it is impossible to design any automatic system capable of encoding them all in any systematic way. Some examples are:

The help functions may vary from time to time. For example, more extensive help may be provided to a novice reader than to a more experienced reader. So, for example, less help may be provided towards the end of a source text that at the beginning or from one text to the next, as a human mark-up editor determines.

The interpretation of source text may depend upon a complex semantic context, even down to the level of a previously shared experience between the writer of the text and its particular readers. In this Invention the help translation is not necessarily generated automatically but can be created with the aid of a human mark-up editor who can take into account the total semantic context.

As well as difficulties in recognition, machine translation systems have difficulty in generating good help information (translations) for all applications: As illustrated in the example in Figure 4 a translation may be literary, i.e. the translator may choose a style of language and forms of expression suited to the material, e.g. a children's book. A translator may choose to translate the same phrase or sentence differently in different contexts because a subtly different expression may better capture the original author's intention. At the moment this is beyond state-of-the-art machine translation systems.

The Invention provides for alternatives to translation. For example, rather than translate 'To be or not to be, that is the question' literally, the help information can provide the explanation that this is a quotation from Hamlet, as illustrated in the sections above.

At any position in a source text, the method of the Invention allows many different types of help to be provided simultaneously as the mark-up at that position defines. The reader could request, for example, a translation of a foreign language sentence as an expert human translator would render it in order to understand its nuances and, for comparison, a literal word-for-word translation of the sentence to understand how its meaning has been constructed from the words of the foreign language. Although clearly within the scope of this Invention, current machine translation systems are not good enough currently to meet such learning objectives.

The limitations of automatic systems are not restricted to textual output. For example, text to speech synthesisers can produce an audio equivalent of an ad hoc piece of text typed in to the system by the listener. This Invention is not intendedfor this type of application. This Invention depends upon mark-

up of a pre-existing text and the preparation of appropriate audio help files for use with it. Speech synthesisers suffer from similar shortcomings to machine translators currently in not being able to match the quality of output produced by a native language human speaker. In large part this is due to the difficulty in recognising the required speech volume, pace, inflexion and other characteristics of speech that depend on the meaning and context of the text. Also, speech synthesis cannot yet accurately mimic all possible characteristics of the human voice. This Invention provides an alternative, better method to provide speech, for example to learn about English dialects by digitising the voices of regional speakers speaking sentences from a prepared text.

General Automated Video Expert (GAVE) The Automated Video Expert is a video variation of the Automated Text Expert and can be generalized to a process called the Generalised Automated Video Expert (abbreviated GAVE) that provides help information while a viewer is watching a video programme on a digital screen. For simplicity, the GAVE is described by reference to the principles of the GATE, which is described in a previous section. At any moment while watching a video programme the viewer can point to a position on the digital screen using some input device and be given a Help Menu that depends upon the mark-up in force at that position in the video stream. As in the GATE, mark-up is in the form of a mark-up file independent of the video stream. The mark-up file describes the markup in force at any position. The video stream is conceived as a sequence of video frames, each of which is conceived to be a sequence of 'mark-up pixels'. Each (mark-up) pixel is at some position in the frame and, therefore, at some position in the video stream. In the GAVE, mark-up is defined in terms of pixel positions analogous to the definition of markup m terms of character positions in the GATE.

In general 'mark-up pixels' will map to display pixels on a digital screen. When a user points to a display pixel there may be ambiguity about the mark-up in force at that display position depending upon the relative resolution of mark-up and display pixels. In this case the viewer is offered a Help Menu with all possible help options at the display position chosen.

Whole video frames may be marked up with associated help information e.g. sub-titles. Alternatively, only portions of frames may be marked up. For example, a person appearing in a video may be marked up so that pointing at the person elicits help information, e.g. the name of the person, the role being played, the person's biography, previous screen credits, etc. Clearly, therefore, the mark-up of an entity such as a person is usually spread across the video stream and consists of a number of mark-up fragments each of which may provide help information about the entity when referenced by the viewer.

Mark-up may overlap e.g. when a person stands in front of a well-known building. The viewer may request help by pointing at the person and the GAVE may offer a help menu with options for the person and the building for the viewer to choose from.

Video editing tools designed to automatically recognise entities in digital video files may be adapted to mark-up entities for help purposes. Video editing tools may be designed to support human editors in marking up video for help purposes according to whatever criteria they choose, e. g. 'clicking on the top right-hand side of the screen pauses the video and provides a synopsis of the story plot so far'.

There is no Associated Look-Up function in the GAVE, and no AS parameter. Help information is provided only via the Indexed Look-Up, In Line and Externally Defined Help Functions. Otherwise the GAVE behaves analogously to the GATE.

Role of Vendors 3 distinct roles for vendors of products and services are envisaged for the Invention À Content publishers are owners or providers of source texts or videos that may be marked up in order to provide helpinformation during reading of the texts or viewing the videos. Mark-up may enhance the value of these texts.

À IT vendors of products that enable readers to read text or view a video on an electronic screen such as a Browser, electronic book, electronic foreign language assistant or video player may include in their products the ATE, GATE, AVE or GAVE processes that embody the method of the Invention.

À Independent nark-up vendors or standards organizations may provide Help Information Corpus files, definitions of mark-up standards, associated Externally Defined Help Functions and mark-up software or services in order to mark-up source texts or videos for varieties of help purposes.

A particular vendor may perform a combination of these roles and offer packages of products and services.

Claims

1. A method of associating functional expressions with source information in an electronic format, by means of mark-up of the information that is itself not detectable by the user of the information but which serves to delimit those portions of the information to which the functional expressions apply, for the purpose of enabling the invocation of functional expressions, usually by the user referring to the delimited portion of the information using an input device and by selecting from a menu when more than one functional expression applies to the portion of information referenced.

2. A method as claimed in Claim 1 where the mark-up of the source information

is contained in a file separate from the source information file.

3. A method as claimed in Claim 1 or Claim 2 where the source information is textual information usually displayed on an electronic screen.

4. A method as claimed in Claim 2 where the source information is video information, usually displayed on an electronic screen.

5. A method as claimed in any of the preceding claims where the execution of the functional expression produces additional information in a textual format for presentation to the user usually as a result of a search or look-up, usually parametised by the content of the mark-up or the content of the delimited portion of source information or both, in a file or database of additional textual information.

6. A method as claimed in any of Claims 1 to 4 where the execution of the functional expression produces additional information in a video format for presentation to the user usually as a result of a search or look-up, usually parametised by the content of the mark-up or the content of the delimited portion of source information or both, in a file or database of additional video information.

7. A method as claimed in any of Claims 1 to 4 where the execution of the functional expression produces additional information in an audio format for presentation to the user usually as a result of a search or look-up, usually parametised by the content of the mark-up or the content of the delimited portion of source information or both, in a file or database of additional audio information.

8. A method as claimed in Claims 5, 6 or 7 where the additional information is designed to provide the user with help in performing some function or acquiring some skill.

9. A method as claimed in Claim in which the additional information is designed to provide help with foreign language understanding or learning.

10. A software process, called the General Automated Text Expert (GATE), which by means of parameterisation with: - the set of mark-up tags used to mark-up source text in the application; - the set of associations of menu labels with mark-up tags, which may be presented to the user when the user requests the execution of a functional expression and there is more than one option; - the set of associations of functional expressions with mark-up tags, which may themselves be parameterised with delimited portions of the source text, parameters derived from the mark-up of delimited portions of the source text and files or databases of additional information; - a set of externally defined functions for use in functional expressions, usually for the purpose of providing help; enables the creation of computer applications embodying any of the methods claimed in Claims 1, 2, 3, 5, 6, 7, 8 or 9.

11. A software process, called the General Automated Video Expert (GAVE), which by means of parameterisation with: - the set of mark-up tags used to mark-up source video in the application; - the set of associations of menu labels with mark-up tags, which may be presented to the user when the user requests the execution of a functional expression and there is more than one option; - the set of associations of functional expressions with mark-up tags, which may themselves be parameterised with delimited portions of the source video, parameters derived from the mark-up of delimited portions of the source text and files or databases of additional information; - a set of externally defined functions for use in functional expressions, usually for the purpose of providing help; enables the creation of computer applications embodying any of the methods claimed in Claims 1, 2, 4, 5, 6, 7, 8 or 9.

12. A semi-automatic software process for associating successive portions of a source text file with portions of an associated text file from chosen starting points within the two files, usually the start of the files initially, by marking-up successive portions of the source text with usually unique indices, either in the text or in a separate mark-up file, and creating a corresponding index file of text portions from the associated text file, each of which is usually a help text portion corresponding to an indexed source text portion, as a result of: aligning the structure of the source and associated texts (start of the text files, end of the text files, chapter, page and paragraph boundaries) and punctuation in the source and associated texts (sentence and phrase boundaries) from the current starting points in each file; - proposing an association between the next portion of the source text and the next portion of the associated text from the current starting points, usually of the next phrase or sentence in each text, to a human reader usually at an electronic screen; - allowing the human reader to either: À confirm the proposed association which thereby confirms the mark-up of the source text portion and augmentation of the index file of associated text portions and establishes new starting points for the next association proposal in both the source and associated text files at the

character positions beyond the text portions of the current association proposal, or À adjust the association of the text portions manually on the electronic screen by means of an input device and thereby define the mark-up of the source text portion and augmentation of the index of associated text portions and establish new starting points for the next association proposal in both the source and associated text files at the character positions beyond the manually adjusted associated text portions, or À skip the association and thereby establish new starting points for the next association proposal in both the source and associated text files at the character positions beyond the skipped text portions in each file, or À otherwise set new starting points within the source text file and text associated file; - repeating the process until the end of the source file is reached or until the human reader terminates it.

13. A semi-automatic software process as claimed in Claim 12 for the purpose of synchronising a source text file with its translation in a foreign language text file.