DATA ENCODING AND DECODING SYSTEMS
The present invention relates to data encoding and decoding systems. The invention relates particularly, but not solely, to systems for enabling data to be transferred from one computer to another, for example systems in which the first computer encodes data and prints it on a document in machine- readable form, and the encoded data is read from the printed document and decoded by the second computer.
There are a number of common situations in which a document is created and printed by a computer, then the document is sent to another location where, in a completely manual operation, selected data is read from the printed document and entered into a second computer: the two computers may well be different and incompatible with each other. For example, in accounting systems, data is entered manually via a keyboard into a computer and printed out in the form of an invoice: the invoice is then sent by mail to a customer, who enters data from the invoice into his own accounts computer, which runs an accounting package to appropriately allocate financial records. The process of manually entering data into the recipient's computer is time consuming and labour intensive. Moreover, errors and omissions are inevitable and usually a laborious reconciliation process is necessary to identify and rectify these. There have been attempts to link accounts computers via electronic mail, but the success of these arrangements have been limited mainly owing to the lack of compatibility, in both hardware and software terms, between the different computers.
It has occurred to us that it would be theoretically possible to use existing bar code technology for the transfer of data from one computer to another via a printed document. In practice however, the use of bar codes for this purpose would present substantial problems. Firstly, the printers most commonly used in industry and commerce, particularly for printing invoices, are dot matrix printers and these are not suited to printing bar codes. Secondly, bar codes occupy a
relatively large area such that if used to convey significant amounts of data, it is difficult if not impossible to fit the bar codes onto the printed document.
We have now devised systems which overcome the above problems and enable data to be encoded and marked onto a substrate in machine-readable form on a small area of the substrate, and for that data to be read subsequently from the substrate, converted into electronic form and decoded.
In accordance with the present invention, as seen from a first aspect, there is provided a system for forming a substrate with machine-readable markings representing data in encoded form, the system comprising a microprocessor arranged to convert alphanumeric data into encoded data, and means for forming a substrate with the encoded data in the form of at least one line of successive characters, each said character being defined by a series of localised dot-shaped marks, said series extending generally perpendicular to said line.
The above-defined system is able to form the substrate with machine-readable markings which occupy a small area yet represent a large amount of data. Typically the substrate may carry other markings in alphanumeric, human-readable form, covering perhaps a major part of the front surface area of the substrate, and the machine-readable markings then occupy a small area compared to such other markings. In one form of the invention, the substrate to be formed comprises a document and the means for forming the substrate with the encoded data comprises a document printer. The system is particularly suited to the use of a dot matrix printer, preferably a nine-pin dot matrix printer: each character of the encoded print is thus defined by a unique pattern of dots at selected positions corresponding to dot positions of the nine-pin dot matrix printer. Such printers are commonly in use and have the advantage that they can print on multipart stationery: however, other types of printer may be used if desired.
In another form of the invention, the substrate comprises a label or other article of textile material, and the means for forming the substrate with the encoded data comprises a machine which forms the required dot-shaped marks by a
sewing, stitching, embroidery, weaving or the like process, using a thread of a colour which contrasts with the textile label or other article itself. For example, the substrate may comprise a label to be stitched into a garment, and carrying marks typically identifying the manufacturer of the garment, the size of the garment, the material from which the garment is made, and washing instructions for the garment. The encoded data, formed on the label in accordance with the present invention, occupies a small area and can uniquely identify the garment and include any desired information related to the garment.
Also, in accordance with the present invention, as seen from a second aspect, there is provided a substrate which is provided with machine-readable markings representing data in encoded form and comprising at least one line of successive characters, each said character being defined by a series of localised dot-shaped marks, said series extending generally perpendicular to said line.
Further in accordance with the present invention, as seen from a third aspect, there is provided a decoding system for reading encoded data from a substrate, in which the encoded data comprises at least one line of successive characters each defined by a series of localised dot-shaped marks, said series extending generally perpendicular to said line, the decoding system comprising an optical scanning device for scanning an encoded data image carried on said substrate, the decoding system including means for decoding the scanned image.
The decoding system preferably comprises a scanning device coupled to a computer, typically a personal computer. Preferably the scanning device is provided with processing circuitry, which may be incorporated in the scanning device itself or on an add-on board for installation in the computer. This processing circuitry preferably comprises a memory or store to which the scanned image is written, and a microprocessor arranged then to analyse the stored image. Preferably the scanner microprocessor checks the stored image for orientation (or skew) and corrects this as necessary. Preferably the scanner microprocessor checks the encoded data for validity. Preferably the scanner microprocessor decodes
the successive characters and passes a stream of decoded characters to the host computer.
Preferably the encoding system is arranged to generate identifying characters for the respective fields of data, and these identifying characters are printed in association with (preferably preceding) their respective encoded data fields. Then the decoding system includes these identifiers in the decoded stream of data, so that the host computer of the decoding system is able to allocate the different items of data to appropriate files and/or fields.
It will be appreciated that the decoding system may be used regardless of the nature or material of the substrate on which the encoded data is carried, and regardless of the manner in which the dot-shaped markings have been formed on that substrate. For example, and as mentioned previously, the substrate may comprise a printed document. As another example, also mentioned previously, the substrate may comprise a label or other article of textile material which has been formed with the dot-shaped markings by a sewing, stitching, embroidery, weaving or the like process.
Preferably in the systems of the present invention, each encoded character is based on a nine-dot code. Of the nine dot positions, preferably seven are used to define the character (as a unique combination of dot presences and absences) . Preferably an eighth dot position is used as a parity bit thus enabling individual character validation in the recipient system. Preferably the final dot position of the nine is used to form a separating line between adjacent lines of characters and thus provide a reference feature for the skew correction facility.
The above-described encoding arrangements provide a sufficient number of combinations to exceed the ASCII code set.
For certain applications, for example pure text documents, the use of the eighth parity dot is unnecessary and the eighth dot position can instead be used to extend the possible character set to give up to 256 characters. This enables commonly occurring words or phrases in text to be represented by a single character thus compressing the data.
It will be appreciated that, for printed documents, the
encoding system of the invention is particularly well suited to the use of a nine-pin dot matrix printer, such that each character of the printed code corresponds to a single position of the printer head. Using simple dot matrix printers, then this sets the resolution required from the scanner at approximately seventy five dots per inch. Present optical scanner technology achieves a resolution of six hundred dots per inch, such that the scanner in accordance with this invention is able to perform at high levels of accuracy.
A typical line of alphanumeric data can accordingly be printed by a dot matrix printer, in the above-described dot coded form, with a compression factor of approximately six-to- one. Moreover, since successive lines of dot coded characters may be printed without the usual spacing between lines, a further two-to-one improvement in the compression factor can be achieved. Thus, an overall compression factor of approximately twelve-to-one can be achieved.
In the case of text documents, the use of the eighth dot position to extend the character set as described above results in further compression of the code. In such cases the code may be printed in a series of blocks in an enlarged right hand margin, each block corresponding to and aligned with a paragraph of text. Such an arrangement enables selective acquisition of text from the document by the operator of the recipient system.
In the case of many printed documents, for example invoices and statements, further compression is possible because significant amounts of data can be omitted from the coded print: the corresponding data can instead be recalculated or expanded by the recipient system (either in the scanning device processing circuitry or in the host computer) . For example, an invoice would normally be printed with costs net of VAT, VAT itself and the total costs inclusive of VAT for each item: however, the coded data might simply include the costs net of VAT and a single-digit VAT code; this encoded information is sufficient for the recipient system to recalculate all of the other data items. By utilising such techniques, a further compression factor of approximately two-
to-one can be achieved: thus for invoices, statements and certain other types of documents, an overall compression factor of twenty four-to-one can be achieved.
The encoding system may be arranged to print data items in one language e.g. English, and the recipient system
(preferably the scanner processing circuitry) arranged to translate selected items into another language. In this, only individual words or short phrases need to be translated, rather than an entire text, and this can be accomplished by a look-up table included in the scanner processing circuitry. As each field of coded data printed on the document is accompanied by its own identifier, the scanner processing circuitry can respond to the relevant identifiers to translate the corresponding data items. The translated data can be displayed on the host computer monitor and, if required, printed out.
In a similar manner, the recipient system may be arranged to make a currency conversion, in accordance with an exchange rate newly programmed into the system when desired by the user. Thus, an invoice written in English and expressed in pounds sterling may be converted by the recipient system to a document with key text in French and currency in Francs.
It will be appreciated that in order to provide security, the data can be encrypted before encoding and printing onto the document. The recipient system is then arranged to decrypt the data after reading the latter from the printed document and decoding it.
The encoded data is preferably printed as a block of several successive lines without spacing between adjacent lines. However, preferably the block is printed with a distinctive feature of shape, so that it can be identified as authentic: for example the block may have a distinctive outline shape, or it may include a void area of distinctive shape. The distinctive feature of shape may enable the data block to be identified and authenticated visually, or by the scanner processing circuit.
Preferably the block of data is enclosed by a continuous line forming an identifiable shape to aid the skew correction process and enable the processor to determine the number of lines in the block of data. It is further preferred
that the continuous line be in the form of a rectangle.
To minimise the overall size of the data block, it is preferred that wraparound techniques be employed at the line ends. Embodiments of the present invention will now be described by way of examples only and with reference to the accompanying drawings, in which:
FIGURE 1 is a schematic diagram showing systems employed, in accordance with the present invention, for producing an invoice at an originating location, and for processing the invoice at a recipient's location;
FIGURE 2 is a diagram to show an example of encoding data in accordance with the present invention;
FIGURE 3 is one example of character encoding key or font used in systems in accordance with the present invention;
FIGURE 4 shows an invoice printed with normal alphanumeric data and also printed with a block of encoded data;
FIGURE 5 is a schematic block diagram of a scanning device used to read the encoded data from an invoice; and
FIGURE 6 is a diagram showing an example of format for the encoded data.
Referring to Figure 1 of the drawings, at a location originating an invoice, the invoice data is entered on a keyboard to a computer 10 which prints the invoice on a dot matrix printer 12. The computer 10 is provided with an accounts software package of any conventional type, for originating the invoice from the data entered on the keyboard. The computer 10 is further provided, for example in the printer driver, with encoding software which accesses output files on the computer 10, then encodes the corresponding invoice data, organises this into a predetermined format, and controls the printer 12 so that the encoded data is printed onto the invoice, in a small area of the document which is free of other printed matter. Thus, the invoice is printed with the usual alphanumeric data in human-readable form, but is in addition printed, in a small area, with some or all of the same data in a machine-readable encoded form, the details of which will be explained below.
Still referring to Figure l of the drawings, at a recipient location to which the invoice is sent, usually in the mail, a computer 20 is provided with a scanner 30 for reading the machine-readable encoded data from each invoice. The scanner 30 is arranged to decode this data, and pass it to the computer 20 where it may be displayed for verification and allocated to appropriate files and/or further processed, as will be described below.
Referring to Figure 2, the encoding of data is carried out on a character-by-character basis: each character (in a horizontal line of characters) is converted into a single vertical line of dots, the positions of which correspond to selected positions of the nine pins of the print head of a nine-pin dot matrix printer. Thus, Figure 2 shows horizontal lines A of normal alphanumeric printing, together with horizontal lines B of the same data printed in encoded form, and also at C the encoded lines on an enlarged scale.
Figure 3 shows one example of character encoding key or font. In this, and for each character, the upper seven dot positions define the character itself (in a unique combination of present and absent dots) , the eighth dot position defines a parity bit, and the ninth dot position provides a separating line between successive encoded lines. In this example, the parity bit ensures that each vertical line contains an even number of dots in total.
Figure 4 shows a typical invoice which is printed with alphanumeric information in usual manner, in this case including details of several products being purchased, the individual prices and corresponding VAT values, and the relevant totals. Additionally, and in accordance with this invention, the invoice is printed with corresponding encoded data: this is printed as a compact, rectangular block D occupying a small area of the invoice, which is free of other printed matter. An indicator, in the form of an arrow or pointer P, is printed adjacent the block to indicate its presence to the recipient of the invoice, and to indicate the direction in which the block D should be scanned.
The scanner may comprise a flatbed device or a hand held device. The scanner is provided with processing
circuitry, either incorporated in the device itself or on an add-on board to be installed in the host computer. Figure 5 shows the scanner circuitry in schematic form. Thus, the output of the optical read head 32 of the scanner is passed to a bit map store 34. The scanner further comprises a microprocessor 36 controlled by software held in a read only memory (ROM) 38. Optionally a further read only memory 39 holding a language look-up table may be provided, preferably with the facility of being easily added or removed from the system e.g. as a plug-in component. In use, the scanned image is written into the store 34, then the microprocessor 36 analyses the stored image. Thus, firstly the image is checked and corrected for orientation, using predetermined datum points or lines in the image, and is then checked for completeness. Next the microprocessor 36 analyses the data line-by-line and checks the character validation of each individual character, before decoding the characters, then passing the stream of decoded characters to the host computer. However, if at any stage the scanner circuitry detects an error, it discards the stored image and prompts the user to scan the document again. Preferably the scanner includes a marker to mark the document, e.g. with a red mark, once the code block has been successfully read, to show that the document has been read.
As will be explained below, each item of data, or data field, is preceded by an identifying character or code, which is generated by the originating computer system. The stream of data passed by the scanner to the host computer, at the recipient location, includes these identifiers, so that the host computer is able to allocate the different items of data to the correct fields in the correct files of its data store. In particular, the decoded invoice data may be displayed on the computer monitor, and it is also allocated to the appropriate data storage files of that computer's accounts package.
Figure 6 shows one example of data record, illustrating the use of a particular example of system protocol. The protocol used in this example divides the fields into two main types, header line fields and item line fields. The header line fields contain data items relevant to the invoice as a whole. Details of each individual item on the invoice are held
in the item line fields: the number of item line fields correspond to the number of items on the invoice. In the example shown:
* denotes the header line control field E denotes the language (English in this case) of the text ! is a "From" field control character
# is the VAT number field control character % denotes the invoice date field & denotes the order number field
( denotes the invoice number field + denotes the item line control field ) is a reference field control character is a suggested cost code (generated by the encoding system and can be used as a default by the recipient system) : is a description control code @ is the amount (net of tax) control field ? is a tax code control character. It will be noted that several of the above characters in Figure 6 are followed by one or two numerical characters giving the number of characters to be found in the corresponding field. In the example shown in Figure 6, the encoded block is printed in a format consisting of successive lines each of a maximum of 40 characters.
It will be appreciated that the systems which have been described enable data to be transferred from one computer to another (in which the two computer operating systems and applications software may be quite different and normally incompatible with each other) without the need for manual re¬ entry of data at the recipient location. By enabling automatic entry of data from the printed document to the computer at the recipient location, the speed, accuracy and reliability of the data transfer are all substantially enhanced. Because the encoded data is printed on the document in a predetermined format, the scanner processing circuit can be arranged to anticipate this particular format. This enables the scanner to operate more easily and quickly in checking the scanned image, identifying and correcting for orientation
errors, checking individual characters for validity and data fields for completeness and authenticity. This factor contrasts with general-purpose scanning systems, which do not have the benefit of a verifiable font or fixed format, but have to interpret the scanned image and decide upon the type of font being used and the character spacing etc.
It will be appreciated that whilst the above description relates to systems for the transfer of accountancy data, the invention has much wider application. In particular, it is envisaged that the systems will have application to computer generated tickets, driving licences, identity cards, bank statements and as an efficient interface for commercially printed material, for example between TV listings and TV video recorders. Further applications include data transfer between different computer systems via printed paper where total confidentiality and security of data are required.
Furthermore, whilst the above description relates to encoding by printing on documents, the invention is generally applicable to the formation of the machine-readable encoded data on any substrate, regardless of the nature or material of that substrate and the manner in which the encoded data is marked on the substrate. For example, as previously mentioned, the substrate may comprise a label of textile material, of the type which is sewn into a garment and carries alphanumeric information in human-readable form: in this case, the encoded data may be marked on the label in a process of stitching, sewing, embroidery, weaving or the like, using a thread of a colour contrasting with the label itself. The encoding can be effected by the same process, and at the same or a subsequent time, as the process for forming the label with its main, human-readable information.
It will be appreciated that the above-described scanning device can be used to read the encoded data, regardless of the nature of the substrate and the manner in which the substrate has been marked with the encoded data.