WO1990003012A2 - Image recognition - Google Patents
Image recognition Download PDFInfo
- Publication number
- WO1990003012A2 WO1990003012A2 PCT/GB1989/001043 GB8901043W WO9003012A2 WO 1990003012 A2 WO1990003012 A2 WO 1990003012A2 GB 8901043 W GB8901043 W GB 8901043W WO 9003012 A2 WO9003012 A2 WO 9003012A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- pixel
- bit map
- image
- group
- discriminators
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/14—Image acquisition
- G06V30/148—Segmentation of character regions
- G06V30/153—Segmentation of character regions using recognition of characters or words
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/19—Recognition using electronic means
- G06V30/196—Recognition using electronic means using sequential comparisons of the image signals with a plurality of references
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
Definitions
- the invention relates to methods and apparatus for recognising two dimensional images, such as text characters, represented in binary form as a bit map of pixels.
- Template (mask) matching or Matrix matching where the image of the character is compared with a set of stored prototype images to achieve a match and recognise the character.
- the technique is constrained by the amount of computer memory required to store the different fonts, it requires the character font to be known to the product, it requires well-defined characters, it does not learn from its mistakes.
- feature analysis where the shape and features of a character image are examined in order that an algorithmic match may be attempted.
- Such a technique has a high degree of font independence and it has a learning capability.
- Problems exist with poorly defined, distorted or broken characters (images) such as are met with in every-day print since these distortions affect the features by which the character is to be recognised.
- N-tuple classifiers which were originally described in a paper entitled “Pattern Recognition and Reading by Machine” by Bledsoe and Browning, 1959 Proceedings of Eastern Joint Computer Conference, pages 225-232 and which are also described in "Guide to pattern recognition using random-access memories” by Aleksander and Stonham, 1979. Computers and Digital Techniques Vol. 2, No. 1, pages 29 -40.
- the N-tuple method is essentially a means comparing information presented to a system with information already "learnt" by the system, so that the system can then make “most like” decisions. This methodology has an ability to cope with the recognition of patterns and shapes including multi-font character recognition.
- image recognition apparatus comprises a first synchronous state machine for segmenting a number of images defined in bit map form into separate pixel groups; and a second synchronous state machine to which each pixel group is applied for classification.
- each pixel group found by the segmenting state machine will correspond to an image which can be classified.
- the inventors have realised the unique merits of the N-tuple method in coping with the variable print quality of "real-world” documents.
- An important feature of the invention is that the inventors have realised the unique and significant advantages in using a synchronous state machine approach, to implement much of the core technology recognition function. This approach is particularly advantageous when utilising a technology development based on the N-tuple method of pattern recognition.
- the core technology recognition function comprises:
- the segmentation process is coupled with:
- Registration The process of providing positional information, to register the relationship of the individual segmented images, thereby allowing the "recognised" characters to be assembled into a data-stream to an appropriate format.
- (b) Classification The process of classifying the images into pre-defined classes.
- the classification process includes the means of handling cases where the classifier: (i) is unable to make a true decision, i.e. a reject error; in this case the classifier should label the result accordingly.
- the system has to recognise the error from other information such as the context.
- a synchronous state machine is one where the stages, of the processes, are stepped-on simultaneously, under control of a system clock.
- the time penalties may be avoided which can occur in asynchronous machines, associated with the processing of each stage, for example due to interrupt routines, polling routines hand-shaking routines and the like.
- a state machine approach for this image recognition application of the N-tuple method of pattern recognition, allows the use of a hardware implementation. This allows a much higher speed of image recognition (than can be achieved by predominately a software approach) at the moderate product prices which are associated with the software based products.
- a method of recognising images represented by respective digital pixel groups comprises presenting each pixel group to an N-tuple classifier having a number of discriminators each adapted to recognise a respective class of a predetermined group of classes and is characterised in that each pixel group is presented to the discriminators in a predetermined sequence; and in that as soon as the output from a discriminator satisfies a recognition condition, the presentation of the pixel group to the classifier is terminated.
- apparatus for recognising images represented by respective digital pixel groups comprises an N-tuple classifier including a number of discriminators each adapted to recognise a respective class of a predetermined group of classes and to which the pixel groups are presented, the apparatus being arranged to present each pixel group to the discriminators in a predetermined sequence; and recognition means for monitoring the output of the discriminators and for terminating the presentation of the pixel group to the classifier as soon as the output from a discriminator satisfies a recognition condition.
- the method comprises comparing the output from each discriminator with a threshold, the recognition condition being satisfied when the threshold is exceeded.
- the situation is:
- the image is identified (recognised).
- each pixel group may be presented to the discriminators in the order of frequency of occurrence of the classes represented by the discriminators. For example, where the images comprise text characters originating from an English language text, the pixel groups may initially be presented to discriminators representing the vowels (as being the commonest occurring letters in the English language) and subsequently to other groups of classes having successively decreasing frequencies of occurrence.
- the discriminator or discriminators to which each pixel group is applied may be chosen in accordance with the location of the pixel group defining the images within the context of the previously detected images. For example, in the case of text, if a full stop has been detected then it would be expected that the next letter is upper case and thus the next pixel group will be presented initially to the group of classes defining the upper case letters.
- the concept of interaction between the classifier and the recognition process is also utilised in a method according to a fourth aspect of the invention for recognising images represented by respective digital pixel groups, the method comprising presenting each pixel group to an N-tuple classifer having a number of discriminators each adapted to recognise a respective class of a predetermined group of classes characterised in that if none of the discriminator outputs satisfies a recognition condition but it is determined that the pixel group defines an image falling within a group of the classes, the method further comprises presenting a portion of the pixel group to a subsidiary N-tuple classifier having a number of subsidiary discriminators each adapted to recognise a respective portion of the group of classes.
- apparatus for recognising images represented by respective digitial pixel groups comprises an N-tuple classifier having a number of discriminators each adapted to recognise a respective class of a predetermined group of classes and to which each pixel group is presented; recognition means for monitoring the outputs of the discriminators; and a subsidiary N-tuple classifier having a number of subsidiary discriminators each adapted to recognise a respective class of a predetermined group of classes defining portions of a respective group of images, the recognition means being adapted to present a portion of a pixel group to the subsidiary classifier if it is determined that the discriminator outputs do not satisfy a recognition condition but the discriminator outputs define an image falling within the group of classes.
- the method preferably further comprises storing data defining the recognised class of the image represented by the pixel group for which purpose the apparatus preferably further comprises storage means.
- each pixel group is presented simultaneously to groups of two or more discriminators in the classifier and, where appropriate, the subsidiary classifier.
- bit map described will generally have a data bus width of one bit.
- the (image) processing tasks require access to a memory system in an incremental fashion and because this is a pixel by pixel addressable task, using dedicated logic circuits, it is more efficiently organised with the memory (bit map) organised with a data bus width of one bit.
- the first port being a conventional memory access port designed to suit a particular microprocessor bus, e.g. an eight bit wide data bus.
- the second port being organised to have a data bus width of one bit, with an addressing system providing fcr an incremental addressing system allowing for both positive and negative displacements in two axes, since it is desired to access individual pixels stored in a two dimensional array.
- a method of segmenting images represented in bit map form comprises scanning the bit map to determine the maximum extents of an image in first and second orthogonal directions and recording for each scan line in the first direction the coordinates of the extreme pixels of the image in the second orthogonal direction; and selecting as defining an image only those pixels within a rectangle defined by the previously determined extents and falling within the previously determined extreme pixel coordinates.
- apparatus for segmenting images represented in bit map form comprises scanning means for scanning the bit map to determine the maximum extents of an image in first and second orthogonal directions and for recording for each scan line in the first direction the coordinates of the extreme pixels cf the image in the second orthogonal direction; and selection means for selecting as defining an image only those pixels within a rectangle defined by the previously determined extents and falling within the previously determined extreme pixel coordinates.
- This method and apparatus is able to cope with overlapping and proportionally spaced images.
- the pixel group comprising the image block is divided into sub-blocks by an estimation of the character boundaries within the pixel block, (representing two or more character pixel groups), for example by a knowledge of the character aspect ratio (obtained from an histogram analysis of the text), each sub-block is then submitted separately to a classification process.
- the scanning of the bit map is carried out in a series of horizontally spaced, vertical scan lines and this leads to the ability to compensate for skew from a knowledge of the line spacing or pitch deduced from a histogram analysis of the page of text.
- the selecting step comprises scanning the bit map in a series of lines extending in a second orthogonal direction and spaced apart in the first orthogonal direction, each line having a length corresponding to the distance between the respective extreme pixel coordinates.
- a method of segmenting images represented in bit map form comprises
- step b) recording the location of those pixels in the bit map which define the detected shape; and repeating the steps a) and b) to locate other images while ignoring in step a) each pixel whose location has been recorded in a step b).
- apparatus for segmenting images represented in bit map form comprises scanning means for scanning the bit map to detect a shape which may comprise an image; and a memory for recording the location of those pixels in the bit map which define the detected shape, whereby the scanning means only responds to those pixels of the bit map whose locations have net been recorded in the memory.
- step b) comprises providing a second bit map coterminous with the bit map defining the images, and recording in the second bit map those pixels which have been found during the scanning step to correspond to a detected shape.
- means are provided to ignore isolated black pixels, during the scanning process, as unwanted background noise.
- An isolated black pixel is one where all its (eight) neighbours are white pixels.
- the images with which the invention is concerned may include characters, such as text characters ( numbers and alpha characters) both arabic and ncn-arabic, and also other two dimensional predetermined patterns and shapes as for example obtained by robot manipulators carrying video cameras.
- the bit map defining the images may be generated in any conventional manner such as by means of a CCD array, a video scan and subsequent digital processing and the like.
- FIG. 1 illustrates the overall system
- Figure 3 is a flowchart illustrating the operation of the computer control system
- Figure 4 is a block diagram of the image preprocessing circuit
- FIG. 5 illustrates the memory system
- Figure 6 is a block diagram of the scan-search circuit
- FIG. 7 illustrates the segmentation system
- FIG. 8A-8D illustrates the extraction process
- Figure 9 illustrates the datum conditions for an extracted shape
- FIG. 10A-10B illustrates the scaling and normalising functions
- Figure 11 shows an example (in block diagram form) of a variable scaling system
- Figure 12 illustrates an example of a scaling table
- Figure 13A-13B illustrate an an example of N-tuple mapping
- Figure 14 illustrates the classification system
- Figure 15 is a flow chart illustrating the operation of the classification system
- Figure 1 illustrates the OCR (optical character recognition) system, by which indicia, existing in printed or written form, are captured as images and converted to data encoded to a computer industry standard.
- OCR optical character recognition
- a video scanner (1) produces digitised video data, representing the black or white pixel image data, captured from a scanned page, as a line by line
- a scanner video interface ( 2 ) orders the video data into a form for subsequent data processing.
- the video data is sent to a recognition unit (3), the recognition unit output (4) being the indicia (character) data encoded to a suitable computer industry standard, such as ASCII (American Standard Code for Information Interchange).
- the scanner (1) may be any convenient form of commercial optical scanner, with appropriate product capabilities and facilities, that is a paper-handling capability (sheet-feed or flat-bed) at an appropriate image resolution, at an appropriate scan time for a page.
- Commercial scanners generally have an image resolution of 300 dots per inch (dpi), which is
- the scanner video interface (2) may take one of several forms, serial or parallel, e.g. SCSI (Small Computer Systems Interface).
- the scanner (1) may alternatively be constructed
- A4 width CCD charge coupled device photo element array
- a control system consisting of analogue image data capture, thresholding, digital conversion, timing, scan control and interface circuits; this to provide digitised video data, representing the bit image data captured from the scanned page.
- Figure 2 illustrates the overall construction of the recognition unit (3).
- the recognition unit system provides for the segmentation and classification functions, these functions being the processes of breaking the scanned images into separate distinct images for each character, registering the relationship of the individual (segmented) character images and classifying the character images into pre-defined character classes.
- the video data from the scanner (1) is interfaced into the recognition unit (3) by a video interface (5), this video interface (5) may be any of several known forms, to suit the scanner's video interface (2).
- the video data is fed into an image pre-processing circuit (6), which processes the video data into a RAM (random access memory), in the form of an image bit map (7) having a one bit wide data bus.
- a RAM random access memory
- the image bit map (7) operates in conjunction with a shadow bit map (8), having pixel locations in one to one correspondence with the image bit map (7).
- the shadow bit map (8) is used to avoid processing the same pixels several times, such duplicated processing occurs in some known segmentation schemes.
- a scan-search circuit (9) performs a vertical raster scan of the image bit map (7), starting from the top left hand corner of the "page". This is to locate potential characters by searching for black pixels which have not been previously processed, i.e. to search for black pixels not in the shadow bit map (8).
- a synchronous state machine segmentation system (10) is used to extract the character shape, associated with the found black pixel.
- the extracted character shape is fed into a normalise and randomise system function (11).
- the character shape is, by this function (11), normalised in size and converted into a random N-tuple form, which is then loaded into the buffered input of a synchronous state machine classification system (12).
- the classification system (12) identifies (classifies) each character, that is so presented. The identification of the
- the computer control system (13) is also used to control certain aspects of the
- the computer control system (13) uses a commercial micro processor which is software controlled, the mode of operation is illustrated in Figure 3.
- the video data received from the scanner (1) is fed into the image preprocessing circuit (6), via the video interface (5), as initiated by the computer control system (13) (Steps 101 and 102 Fig.3).
- the image pre-processing circuit (6) is shown in more detail in Figure 4.
- the video data is fed to control logic (15).
- the data compression may be arranged to favour white, to enhance the character bit image separation.
- a circuit (18) electronically adds a white border (to define the boundary conditions to be used for
- the video data will require to be loaded [ into the recognition unit (3) ] in more than one data transfer operation, this will be controlled by the computer control system (13).
- the computer control system (13) sets bitmap pointers to the scan "start" position, (Step 105 Fig.3).
- Such memory devices have memories organised with a data bus width which suits a particular micro-processor standard, the commonly encountered data widths being eight, sixteen or thirty-two bits.
- the memory system is associated with the processing of shapes, (image data) contained within that memory; the data to be processed exists as black or white pixels (binary values of picture elements) which are stored in a two dimensional array of single pixel values and which is referred to as a bit map.
- the image data processing tasks require access to the bit map memory system in an incremental fashion, which is a pixel by pixel addressable task. This task is most efficiently organised with the bit map having a data width of one bit, since dedicated mixtures of
- combinational and sequential logic can then be used to achieve higher execution speeds, than could be achieved by a conventional microprocessor access to memory via a multi-bit data bus and software selection.
- the first port representing the
- microprocessor interface (19) is a conventional memory access port, designed to suit a particular microprocessor data bus width, for example using an eight bit wide data bus.
- the second port representing the image processing interface (20) is organised to have a one bit wide data bus, the addressing system of which provides for an incremental addressing, allowing positive and negative movements in the two axes of the memory plane. Such movements (in the two axes of the memory plane) are required in the segmentation process to be described.
- An access arbitration circuit (21) prevents a memory access occurring simultaneously from both the microprocessor and the image processing interfaces; the microprocessor interface (19) is one that can be forced to wait until the memory is ready for it.
- the access arbitration logic ensures that only one set of address and data drivers are energised at any one time, thus preventing an undesirable "clash" of accesses.
- a signal M/S is used to enable one or other set of drivers, within the address multiplex and write
- the address multiplex circuit (22) functions as a selecting switch, according to which interface had the right of access to the memory at any particular time, it allows for the selection of the appropriate address required for any particular memory access.
- a 1 of 8 write decoder (25) is required to enable the image processing interface (20), which handles one pixel at a time, to selectively write one bit within the eight bits of a byte.
- An equivalent function is required for read access be the memory from the image processing
- this interface in order for this interface to be able to select one particular bit from the byte, this is referred to as the 1 of 8 bit select circuit (26).
- the 8 bit datadriver (27) does not require the complexity that would be normally required if a bit selecting interface was to be provided to allow the image processing functions to write bits to memory on an individual basis. This simplification is achieved because,
- the address register (28) holds coordinate pixel location information, corresponding to the coordinates of the pixels representing the scanned image video data.
- the offset address adder (29) is a binary parallel adder circuit, constructed from
- combinational logic it is capable of handling an X or Y offset in positive and negative form in order to be able to address any pixel within the memory array (24).
- the addressed pixel can be either (a) left or right of the horizontal coordinate stored in the address register (29) or (b) above or below the
- the address register (28) performs various operations
- the shadow bit map (8) being initially cleared to zero (all white) which is taken to be the not yet processed state of the scanned image video data.
- Writing to the shadow bit map (8) occurs as the image bit map (7) is conditionally scanned during the segmentation process; that is,to segment the character, the data in the image bit map (7) is read and an identical pixel is written to the shadow bit map (8), thus a copy of the character shape will appear in the shadow bit map (8) indicating that the character shape has been segmented. This conditional scanning of the image bit map (7), to ignore character shapes
- shadow bit map (8) An advantage of the shadow bit map (8) is that the image bitmap (7) is preserved to allow a re-examination of the pixel data, if so desired.
- a transceiver (bidirectional TRANS-mitter and re-CEIVE 30), present in the data path from the memory (24) to the microprocessor interface (19), isolates the memory from other data circuits connected to the micro processor.
- the next step 106 is to initiate the scan-search routine.
- the image bit map (7) is processed by the scan-search circuit (9), shown in more detail in Figure 6. This process operates in conjunction with the segmentation system (10) (to be described).
- the scan process applies a vertical raster scan to the image bit map (7), starting at the top left hand corner (relative to the lines of text on the original scanned document), this position is easily determined relative to the "white border" applied by the image pre-processing circuit (6).
- the raster scan is in the vertical direction downwards, moving left to right and continues until the scan encounters a non-processed "new" pixel, that is a black pixel which does not appear in the shadow bit map (8).
- the first "new" (black) pixel of a character so encountered, by the vertical scan will be the uppermost-leftmost black pixel of that character, the XY position of which will be described as the "found coordinate" of that character.
- the vertical raster scan allows for the handling of skewed lines of text (on the original scanned document), since each character is found in sequence according to the spacings of the lines of text. Knowing the range of spacings of the lines of text and the vertical coordinates of each character then the text may be reconstructed on a line by line basis.
- the vertical spacings may be easily determined from the character positional information derived from the scan process.
- the shadow bit map (8) is scanned pixel by pixel at the same time.
- the comparison-of the binary states of the pixels between the two bit maps is implemented with a 2 input logic gate circuit known as the new pixel selector circuit (31).
- the "found coordinate" of that (found) pixel is loaded into a found coordinate register (32) and a message is sent to the computer control system (13) (step 107, Fig.3).
- the computer control system (13) immediately starts the segmentation process (to be described) to extract the character shape (step 108. Fig.3).
- the character shape is determined (step 109, Fig.3) it is possible to continue the raster scan, since the shadow map is then completed with respect to the
- a re-examination of the image data may be required, such a re-examination may be achieved by either (a) re-scanning the image bit map (7) as a whole, or (b) by re-scanning selected areas by clearing the appropriate areas of the shadow bit map (8) back to zero (white) to allow for the pattern (s) to be found again.
- Patterns within the image bit map (7) can be of
- the shadow bit map (8) ensures that previously processed groups of pixels corresponding to the patterns already
- the segmentation process is initiated, at step 108 Fig.3, by the computer control system (13). As
- segmentation system (10) uses a synchronous state machine (33) ( Figure 7), where combinational transition functions (34) (Fig.7) are used to define the conditions and sequence of the state machine.
- a synchronous state machine is one where every stage of the machine is stepped-on on simultaneously under the control of a system clock.
- Figure 16 illustrates the use of a combinational transition function (34), which allows for conditional decisions to be made at every step and acts as a combinational logic array, to set the conditions and hence decide the "next state" out of the state
- the combinational transition functions (34) may reside either (a) in a non-volatile memory such as PROM
- the search proceeds around the outside of the boundary and finishes when the start pixel (i.e. the found coordinate) has been returned to. Whilst this search is occurring two measurements are taken, (a) for the size and (b) for the profile of the shape.
- the first measurement uses a system of peak detecting registers, described as excursion registers (35) to record the maximum horizontal (right-most) and vertical (topmost and bottom-most) extents of the shape, note that the left-most extent corresponds to the X value (of the Y axis) of the found coordinate.
- excursion registers (35) represent the size of the enclosing rectangle for the character shape.
- the second measurement uses a pair of random access memories (36) and (37) to record the left most and right-most horizontal pixel coordinates for every line (one pixel width) of the shape addressed by the vertical coordinate.
- the left and right pixels are indicated by 'L' and 'R' respectively in Figure 8C and represent the left and right profiles of the character shape.
- the character may now be extracted from the bit map memory by performing a raster scan (of the bit map memory) for the enclosing rectangle in the direction left to right, moving downward, allowing only those pixels whose coordinates are within the range enclosed by the left and right profiles of the character. This is achieved by passing the coordinate values of the right and left profiles and the coordinates of the enclosing rectangle to an extract control circuit ( 38 ) . This scan will have the effect of removing any
- the “aligned coordinate” is determined as the topmost and left-most coordinate of the enclosing rectangle for that character as illustrated in Figure 9.
- the “aligned coordinate” is loaded into the aligned coordinate register (39).
- the extracted shape is then fed into the normalise and randomise system function (11) at the same time a message is sent to the computer control system (13) (step 109, Fig.3) this message contains the size limits for the extracted shape and the "aligned coordinate"
- the aligned coordinate provides a datum for the character position and is used to reassemble
- the computer control system (13) assesses the enclosing rectangle for any of the following
- the classification operation is aborted (step 112, Fig.3) and the pixel group comprising the extracted character is classed as unidentifiable, i.e. an unrecognised character. If (c) condition applies, the unrecognised pixel group block is divided into a series of sub-blocks, by estimation of the character boundaries within the pixel group block, each sub-block is submitted separately to the classifier (step 113 Fig.3).
- the extracted (segmented) character shape is required to be "normalised” to a standard “enclosing rectangle” size, e.g. 32 x 32 pixels, prior to
- the normalisation can be achieved by initially scaling downwards in area by say 4:1, 16:1, 64:1 ratios, so that the scaled shape size is less than the required normalisation size and then using a look-up table approach to achieve the required normalisation result.
- Figure 10A illustrates the technique. The initial scaling (downwards) is achieved by the fixed scaling system (40) and the size
- variable scaling system (41) An example of a variable scaling system (41) is shown in more detail in Figure 11.
- the system has horizontal and vertical counters (42) and (43), driven by a clock which has a frequency chosen to suit the cycle time of the memories.
- the horizontal and vertical counters (42) and (43) are driven by a clock which has a frequency chosen to suit the cycle time of the memories.
- vertical counters (42), (43) are a pair of counters, which count from zero to full house one's during the scaling operation.
- Horizontal and vertical size registers (44) and (45) are provided, each comprising a 5 bit register whose contents do not change during the scaling operation, having been (previously) set to the actual size of the character shape within the bit map memory.
- the values held in the size registers are actually one less than the size of the shape to be scaled, i.e. a size register value of 00011 binary (3 decimal) indicates, to the scaling system, that the shape has a size of four pixels in that particular direction, horizontal or vertical.
- the horizontal and vertical scaling memories (46), (47) connected with the respective counters and registers (42)-(45) are
- Each 1024 by 5 bit scaling memory has a ten bit address, made up from the five bits (each) of the.
- a 5 bit data word will be available at the data out terminals. These are referred to as the modified x and y addresses, x and y corresponding to horizontal and vertical pixel
- the output memory (48) is a static Random Access Memory providing storage of 1024 by 1 pixels.
- the ten bit address to this output memory (48) is driven by the two counters (42), (43), horizontal and vertical, from zero to full house ones. Every pixel is thus addressed once, with the black or white value written into the particular location, being derived from the pixels stored in the bit map, that had been addressed by the address that has been modified as described.
- the variable scaling algorithm in relation to the operation of the scaling tables may be represented as follows.- Variable scaling, on a given axis, is determined by the maximum size of the intermediate block (Fig 10A), along that axis.
- N is the maximum (pixel) excursion in the intermediate block
- the 'Y' axis of the table (labelled Size S) equals (N-1).
- the 'X' axis of the table (labelled P) is the Pixel Number (P) for the final pixel block, i.e. P goes from 0 to 31, for a 32x32 normalised pixel block.
- the table value is the Pixel Number M in the intermediate block.
- the Size s equals (N-1) is 24 and the pixel states (black or white), in the final pixel block P, are determined from the pixel numbers (locations) derived from the tables.
- the scaled "normalised" character shape is now presented to the randomisation function (Fig.10A).
- the randomisation function generates pseudo-random n-tuples by use of another look-up table.
- the requirement is that the normalised (32x32) pixel group block must be mapped into a series of n-tuples such that:
- this represents the 32x32 pixel block mapped into 128 separate 8-tuples.
- the initial relationship between the pixels selected to form 8-tuples is random but once this random selection has been chosen it remains unchanged.
- a look-up table can be constructed to map, a given pixel location, to a given bit number, in a given 8-tuple. For the example shown in Fig 13A, if the 32x32 pixel block coordinates are set such that coordinate 0,0 corresponds to the top left hand corner of the map then the mapping
- the bit value would be '1' or '0' to correspond with the black or white state of the pixel so located.
- look-up tables may conveniently reside in:- either (a) non volatile memory (e.g. PROM, PAL etc), or (b) volatile memory (e.g. RAM), initialised by software on power-up of the machine.
- non volatile memory e.g. PROM, PAL etc
- volatile memory e.g. RAM
- the final, operation is to load the N-tuple buffer input to the classification system (12) and to send a
- the computer control system (13) now decides as to whether to proceed with the classification process, or to abort for the reasons previously stated, or to go into a classification sub-routine.
- the classification (normal routine) is initiated at step 115, Fig.3 by the computer control system (13).
- the classification system (12) as previously mentioned is a synchronous state machine. The approach is similar to that already described in relation to the synchronous st ⁇ te machine for the segmentation system. That is combin.ational transition functions are used to define the conditions and sequence of the state
- classifiers are described in the reference papers on N-tuple technology.
- the "classifier” is pre-trained with the range of patterns or classes it is required to recognise.
- the classifier responds with a ranking list of the classes, i.e. of the "most like” scores relative to the training set.
- the N-tuple method is essentially a means of comparing the unknown pattern with the range of patterns already “learnt” by the classifier, so that the classifier can make “most like” decisions.
- the top ranked (score) would (normally) be selected as that representing the pattern. In the preferred embodiment, this selection would also be dependent on:
- the classification system (12) is shown in more detail in Figure 14.
- the mode of operation of the classification system is illustrated in Figure 15.
- the classification system comprises an n-tuple counter (50) and a (Class) group counter (51). These counters are driven by the same system clock which drives the scaling system previously described.
- the n-tuple and group counters (50), (51) comprise a seven bit counter and a three bit counter, respectively connected as a ten bit counter. This counter counts from zero to full house one's during the response calculation operation. Initially the counters are set to zero (step 200, Figure 15).
- the output from the n-tuple counter (50) is a number which is used as an address for an n-tuple memory (49). The seven bits are used to sequentially address the 128 n-tuples stored within that memory.
- the n-tuple memory (49) will have previously been loaded from the normalise and randomise system function (11) and will contain a random n-tuple pattern of bits that represent the normalised shape, that has been extracted from the bitmap memory (7).
- n-tuples are addressed sequentially by the incrementing n-tuple counter (50) and the eight bit values of those n-tuples are presented to a
- discriminator memory (53) as addresses which are combined with both the seven bit output of the n-tuple counter (50) and the four bit output from the group counter (51) to produce a 19 bit address that is used by the discriminator memory (53).
- the discriminator memory (53) is a Random Access Memory constructed from Dynamic Random Access Memory elements which are organised, for the purposes of a parallel response discriminator, as an eight bit wide data bus memory system.
- the values read from the discriminator memory (53) are interpreted as single bit responses (step 202, Fig.15), these are required to be summed in order to produce total responses for all classes that are being tested for possible recognition.
- a collection of eight bit counters or incrementors (54) are connected to the data output terminals of the discriminator in such a fashion that they will increment, or count up by one, if the
- a class counter (55) is used to read the eight bit values that have accrued in the response incrementors (54) (step 204, Fig.15) and to write them into a table of responses stored in a responses memory (56) (step 205, Fig.15).
- the responses memory (56) consists of a static Random Access Memory, organised according to the number of classes of
- a message is sent to the computer control system (13), providing classification data (step 116, Fig.3).
- the computer control system (13) then proceeds with the initial post-processing (stage 1) (to be described) and recommences the segmentation routine (step 117, Fig.3), i.e. returns to step 108, Fig.3.
- the segmentation/classification sequence continues until all the characters are classified, i.e. until all the patterns within the image bitmap (7) have been extracted, segmented, normalised and classified.
- the computer control system (13) continues the post-processing (stage 2).
- the initial post-processing (stage 1) is to check for items, such as punctuation, known ambiguities, nonsense (based on known response values), invalid classes etc. and to complete the identification of the character.
- the final post-processing (stage 2) is to
- the recognition unit for subsequent display, the entire pixel group representing the character, this allows for a human interrogation (intervention).
- the output from the stage 1 postprocessing should be loaded into a "shape buffer" memory store.
- stage 1 postprocessing software has to arrange to "tag" each result with the image map location data previously received (steps 107, 109, Fig.3); this information may then be used, in the embodiment for text recognition, to rec ⁇ mpose the page and ensure the correct ordering of the recognised characters, as described for stage 2 post-processing.
- the computer control system (13) may decide to require that the character is reclassified, e.g. by presentation to a sub-set of classes as
- the order in which the discriminator memory (53) is accessed may be arranged in a special form, again as previously described.
- the first classes accessed may comprise the vowels.
- the computer control system (13) may compare each response against
- Post-processing can be used to carry out other functions, to improve the error rate and to provide special facilities, for example:
Landscapes
- Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Character Input (AREA)
- Character Discrimination (AREA)
Abstract
Description
Claims
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| GB888821024A GB8821024D0 (en) | 1988-09-07 | 1988-09-07 | Image recognition |
| GB8821024.0 | 1988-09-07 |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| WO1990003012A2 true WO1990003012A2 (en) | 1990-03-22 |
| WO1990003012A3 WO1990003012A3 (en) | 1990-07-26 |
Family
ID=10643220
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/GB1989/001043 Ceased WO1990003012A2 (en) | 1988-09-07 | 1989-09-06 | Image recognition |
Country Status (4)
| Country | Link |
|---|---|
| EP (1) | EP0433359A1 (en) |
| JP (1) | JPH04502526A (en) |
| GB (1) | GB8821024D0 (en) |
| WO (1) | WO1990003012A2 (en) |
Cited By (9)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US7546236B2 (en) | 2002-03-22 | 2009-06-09 | British Telecommunications Public Limited Company | Anomaly recognition method for data streams |
| US7570815B2 (en) | 2002-03-22 | 2009-08-04 | British Telecommunications Plc | Comparing patterns |
| US7574051B2 (en) | 2005-06-10 | 2009-08-11 | British Telecommunications Plc | Comparison of patterns |
| US7593602B2 (en) | 2002-12-19 | 2009-09-22 | British Telecommunications Plc | Searching images |
| US7620249B2 (en) | 2004-09-17 | 2009-11-17 | British Telecommunications Public Limited Company | Analysis of patterns |
| US7653238B2 (en) | 2003-12-05 | 2010-01-26 | British Telecommunications Plc | Image filtering based on comparison of pixel groups |
| US8040428B2 (en) | 2005-12-19 | 2011-10-18 | British Telecommunications Public Limited Company | Method for focus control |
| US8135210B2 (en) | 2005-07-28 | 2012-03-13 | British Telecommunications Public Limited Company | Image analysis relating to extracting three dimensional information from a two dimensional image |
| BE1020588A5 (en) * | 2011-08-11 | 2014-01-07 | Iris Sa | METHOD OF RECOGNIZING FORMS, COMPUTER PROGRAM PRODUCT, AND MOBILE TERMINAL. |
Family Cites Families (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US4104616A (en) * | 1976-01-28 | 1978-08-01 | Sumitomo Electric Industries, Ltd. | Hand operated optical character recognition system |
| DE3026055C2 (en) * | 1980-07-09 | 1984-01-12 | Computer Gesellschaft Konstanz Mbh, 7750 Konstanz | Circuit arrangement for automatic character recognition |
| SE448922B (en) * | 1980-10-21 | 1987-03-23 | Ibm Svenska Ab | METHOD FOR PROCESSING VIDEO DATA BY AN OPTICAL SIGN IDENTIFICATION SYSTEM WITH A CHARACTER IDENTIFICATION DEVICE IN AN OPTICAL DOCUMENT READER |
-
1988
- 1988-09-07 GB GB888821024A patent/GB8821024D0/en active Pending
-
1989
- 1989-09-06 EP EP89910158A patent/EP0433359A1/en not_active Withdrawn
- 1989-09-06 WO PCT/GB1989/001043 patent/WO1990003012A2/en not_active Ceased
- 1989-09-06 JP JP1509636A patent/JPH04502526A/en active Pending
Cited By (9)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US7546236B2 (en) | 2002-03-22 | 2009-06-09 | British Telecommunications Public Limited Company | Anomaly recognition method for data streams |
| US7570815B2 (en) | 2002-03-22 | 2009-08-04 | British Telecommunications Plc | Comparing patterns |
| US7593602B2 (en) | 2002-12-19 | 2009-09-22 | British Telecommunications Plc | Searching images |
| US7653238B2 (en) | 2003-12-05 | 2010-01-26 | British Telecommunications Plc | Image filtering based on comparison of pixel groups |
| US7620249B2 (en) | 2004-09-17 | 2009-11-17 | British Telecommunications Public Limited Company | Analysis of patterns |
| US7574051B2 (en) | 2005-06-10 | 2009-08-11 | British Telecommunications Plc | Comparison of patterns |
| US8135210B2 (en) | 2005-07-28 | 2012-03-13 | British Telecommunications Public Limited Company | Image analysis relating to extracting three dimensional information from a two dimensional image |
| US8040428B2 (en) | 2005-12-19 | 2011-10-18 | British Telecommunications Public Limited Company | Method for focus control |
| BE1020588A5 (en) * | 2011-08-11 | 2014-01-07 | Iris Sa | METHOD OF RECOGNIZING FORMS, COMPUTER PROGRAM PRODUCT, AND MOBILE TERMINAL. |
Also Published As
| Publication number | Publication date |
|---|---|
| WO1990003012A3 (en) | 1990-07-26 |
| EP0433359A1 (en) | 1991-06-26 |
| GB8821024D0 (en) | 1988-10-05 |
| JPH04502526A (en) | 1992-05-07 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| JP4607633B2 (en) | Character direction identification device, image forming apparatus, program, storage medium, and character direction identification method | |
| US5854854A (en) | Skew detection and correction of a document image representation | |
| CA2192436C (en) | System and method for automatic page registration and automatic zone detection during forms processing | |
| US5539841A (en) | Method for comparing image sections to determine similarity therebetween | |
| US5335290A (en) | Segmentation of text, picture and lines of a document image | |
| US5410611A (en) | Method for identifying word bounding boxes in text | |
| US4903312A (en) | Character recognition with variable subdivisions of a character region | |
| CA1160347A (en) | Method for recognizing a machine encoded character | |
| US5033104A (en) | Method for detecting character strings | |
| US6014450A (en) | Method and apparatus for address block location | |
| EP0543599A2 (en) | Method and apparatus for image hand markup detection | |
| Gatos et al. | Automatic page analysis for the creation of a digital library from newspaper archives | |
| WO1990003012A2 (en) | Image recognition | |
| JPS62281085A (en) | Extruction of character component | |
| JPS5991582A (en) | Character reader | |
| EP1010128B1 (en) | Method for performing character recognition on a pixel matrix | |
| Balm | An introduction to optical character reader considerations | |
| EP0692768A2 (en) | Full text storage and retrieval in image at OCR and code speed | |
| JP3406942B2 (en) | Image processing apparatus and method | |
| JP3072126B2 (en) | Method and apparatus for identifying typeface | |
| EP0201909A2 (en) | Procedure for automatic reading of images and device for carrying out this same procedure | |
| JP2917396B2 (en) | Character recognition method | |
| JPH05298487A (en) | Alphabet recognizing device | |
| CN115761761A (en) | Text recognition method and device, electronic equipment and storage medium | |
| JPH0433074B2 (en) |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| AK | Designated states |
Kind code of ref document: A2 Designated state(s): JP US |
|
| AL | Designated countries for regional patents |
Kind code of ref document: A2 Designated state(s): AT BE CH DE FR GB IT LU NL SE |
|
| AK | Designated states |
Kind code of ref document: A3 Designated state(s): JP US |
|
| AL | Designated countries for regional patents |
Kind code of ref document: A3 Designated state(s): AT BE CH DE FR GB IT LU NL SE |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 1989910158 Country of ref document: EP |
|
| WWP | Wipo information: published in national office |
Ref document number: 1989910158 Country of ref document: EP |
|
| WWW | Wipo information: withdrawn in national office |
Ref document number: 1989910158 Country of ref document: EP |