IMAGE ANALYSIS FOR IMAGE DATABASE INDEXING
This invention relates to image analysis, and in particular image processing and database management and to image analysis especially for the indexing, categorisation, and content-based retrieval of electronically stored images. The invention further relates to image representation in general, and to image representation particularly for the matching, indexing, categorisation or retrieval of images based on the image content.
In present times, literally millions of images are created each day and there is accordingly a huge body of image information available which may be stored as files in an image database. Plainly, there is a requirement that any image file can be retrieved from such a database and printed or otherwise displayed on demand.
It is a relatively straightforward matter to search a database of text files for the presence of a given alphanumeric string because a limited linguistic or mathematical vocabulary is associated with the text file, and even in large databases, searching can be carried out quite rapidly, using keywords as required. It would of course be possible to attach one or more keywords as an alphanumeric string to any image file and to search the database on that basis, but this is not satisfactory since different people might ascribe different keywords to any given image, or the same keywords to a wide variety of different images. People tend to prefer to interrogate an image database in accordance with what the image itself contains rather in accordance with some label that is put on that image.
The problem therefore arises of developing an image analysis technique whereby the images held in a database may be analysed according to their
content and the database may be instructed to supply all images which look similar to a target or query image.
Useful reviews of the recent development of systems for image retrieval appear in sections of US Patents 5,852,823 (Microsoft) and 5,893,095 (Nirage ,Inc.) in particular in sections respectively headed "Background of the Disclosure" and "Background of the Invention" .
In particular, US 5,852,823 claims a system, responsive to a query image set, for retrieving, from a stored database, a desired image that is visually similar to the image set, said set containing at least one query image, said system comprising: an image database having stored images therein; a signature generator, responsive to an input image and having three successive convolution filters, for producing a signature for the input image, the signature containing a numeric measure of each of a plurality of pre-defined characteristics of the input image; wherein the signature generator, in response to the query image set, generates a corresponding signature for each image in the query image set, and, in response to a test image accessed from the stored database, generates a corresponding test signature; a statistics generator, responsive to the signature for each image in the query image set, for providing a separate pre-defined statistical measure, for each of the pre-defined characteristics, that represents variability of said each pre-defined characteristic across all images in the query image set; a image database manager, operative in conjunction with the image database, for retrieving successive ones of a plurality of stored images from the database and for routing each one of the plurality of successive images as the test image to the signature generator so as to generate a plurality of corresponding test signatures;
a comparator for comparing, across all of the pre-defined characteristics, the statistical measure for each of the pre-defined characteristics against a corresponding numeric measure in each one of the test signatures so as to yield a similarity measure, the similarity measure for said one of the test images being reflective of a degree of visual similarity between the one test image and the query image set; and a manager, responsive to the similarity measure for each of the test images and operative in conjunction with the image database manager, for selecting at least one of the test images having a highest relative similarity measure as a retrieved image and providing the retrieved image as output, and it describes an apparatus and an accompanying method for generating a semantically based, linguistically searchable, numeric descriptor of a pre-defined group of input images and which is particularly useful in a system for automatically classifying individual images, on a numerical basis, in, e.g. , an image database, and, through a query-by-example paradigm, retrieving a desired image (s) therefrom. Specifically, a signature is computed for each image in a set using multi-level iterative convolution filtering, with pixel values supplied as input to each filtering level being separately convolved with each one of a set of predefined Gaussian kernels. Average and variance vectors, as collective numeric descriptors of all the images in the set, are separately computed across corresponding elements in all the image signatures for the set. A linguistic term, semantically descriptive of all the images in the set, is associated with the numeric descriptors of this set and, with this set, is stored in a database. For image retrieval, the descriptor for any set is accessed by a textual search through the database using the appropriate linguistic term. The descriptor is then compared against accessed signatures for other images in the database in order to retrieve a image, among those stored in the database, that is the most similar to those in the set associated with the descriptor.
US 5,893,095 claims a search engine, comprising: a function container capable of storing primitive functions; a registration interface storing functions to the function container; and a primitive supplying primitive functions to the registration interface, wherein the primitive functions include an analysis function capable of extracting features from an object, together with a method of comparison using such a search engine, and it goes on to describe a system and method for content-based search and retrieval of visual objects. A base visual information retrieval (NIR) engine utilises a set of universal primitives to operate on the visual objects. An extensible NIR engine allows custom, modular primitives to be defined and registered. A custom primitive addresses domain specific problems and can utilize any image understanding technique. Object attributes can be extracted over the entire image or over only a portion of the object. A schema is defined as a specific collection of primitives. A specific schema ;implies a specific set of visual features to be processed and a corresponding feature vector to be used for content-based similarity scoring. A primitive registration interface registers custom primitives and facilitates storing of an analysis function and a comparison function to a schema table. A heterogeneous comparison allows objects analysed by different schemas to be compared if at least one primitive is in common between the schemas. A threshold-based comparison is utilised to improve performance of the NIR engine. A distance between two feature vectors is computed in any of the comparison processes so as to generate a similarity score.
One particularly promising approach to image database indexing and retrieval is the query-by-image-content (QBIC) method (W. Νiblack et al, "Querying images by content using colour, texture and shape" , Proc. SPIE Conference on Storage and Retrieval for Image and Video Databases, vol. 1908, pp. 173-187, 1993) whereby the visual contents of
the images, such as colour distribution (represented by a colour histogram), texture attributes and other image features are extracted from the image using computer vision/image processing techniques and used as indexing keys. In an image database, these visual keys are stored along with the actual image data, and image retrieval from the database is based on the matching of the model's visual keys with those of the query image (s) .
Early approaches used colour distribution alone for content-based image indexing and retrieval (M. Swain and D Ballard, "Colour Indexing" , International Journal of Computer Vision, Vol. 7, pp. 11-32, Year 1991) . Because a colour histogram is a global measure and contains no information about the relative spatial distribution of the colours, the success of the colour histogram method is quite limited. Recognising the importance of relative spatial information, researchers have proposed various methods to incorporate such information in the indexing and retrieval for image databases. Although it is not possible to be exhaustive, representative approaches to incorporating the spatial information into content-based image indexing and retrieval include: classical texture descriptors (W.Y. Ma and B.S. Manjunath, "NeTra: A toolbox for navigating large image databases" , Multimedia Systems, 7, 184-198, 1999) , image segmentation ("blobworld") (C. Carson et al, "Blobworld: A system for region-based image indexing and retrieval" , Proc. Int. Conf. Visual Information Systems, Springer- Verlag, 1999) , joint histograms of colours and edges (G. Pass and R. Zabih,
"Comparing images using joint histograms" , Multimedia Systems, vol. 7, pp. 234-240, Springer- Verlag, 1999) , and colour correlogram (J. Huang et al, "Image indexing using colour correlogram", IEEE CVPR, pp. 762-768, 1997) .
Recent work has also proposed in the use of a binary space partitioning tree as an efficient representation for capturing the colours and their spatial distribution information (G. Qiu and S. Sudirman, "Representation and retrieval of colour images using binary space partitioning tree" , 8th Colour Imaging Conference, Scottsdale, Arizona, USA, Nov. 2000, pp. 195-201) .
A survey of various techniques for indexing and retrieval can be found in (Y. Rui, T.S. Huang, T.S. and S.F. Chang, "Image Retrieval: Current Techniques, Promising Directions, and Open Issues" , Journal of Visual Communication and Image Representation, Vol. 10, pp. 39-62, 1999) .
It is an object of this invention to provide a new and alternative system for analysing and representing the contents of an image which analysis or representation may then be applied in various ways relating to image database management including the indexing, comparison, retrieval and recognition of images based on the image content, and which has certain advantages over previously known techniques as will be referred to later in this specification.
The present invention provides an image analysis system, which may be a processing or representation system, in which an input image is deconstructed into primitive images corresponding to palette features of the input image and in which the geometric properties of the binary images are measured and populations of different geometrical measures of the respective palette features are tabulated.
The present invention also provides an image indexing system in which the image is deconstructed into a plurality of primitive images each reflecting a palette feature of the original image and then tabulating the populations of distinct geometrical features of each such primitive image.
The invention includes an image retrieval system for retrieving images similar to a query image from an image database in which an algorithm is applied to the query image to deconstruct the query image into a plurality of primitive query images each reflecting a palette feature of the original query image and then tabulate the populations of distinct geometrical features of each such primitive query image to form a query table, and then comparing the query table with image tables created by applying said algorithm to images held in said database.
The invention includes an image database which contains image tables created by applying an algorithm to each of a set of images to deconstruct each of those images into a plurality of primitive images each reflecting a palette feature of the original image and then tabulating the populations of distinct geometrical features of each such primitive image to form a lookup table.
The invention also includes a workstation programmed with an algorithm adapted to be applied to an image to deconstruct that image into a plurality of primitive images each reflecting a palette feature of the original image and then to tabulate the populations of distinct geometrical features of each such primitive image to form a digital table.
Such workstation may be remote from or wired to a database storing images optionally operated on by said algorithm. The workstation is suitably programmed to apply the algorithm to a query image to form a query table and then to compare the query table with image tables created by applying said algorithm to images held in said database, in order to retrieve images similar to the query image.
The invention extends to a method of retrieving images retrieving images similar to a query image from an image database in which an algorithm is
applied to the query image to deconstruct the query image into a plurality of primitive query images each reflecting a palette feature of the original query image and then tabulate the populations of distinct geometrical features of each such primitive query image to form a query table, and then comparing the query table with image tables created by applying said algorithm to images held in said database.
The invention also extends to a method of indexing images in which an algorithm is applied to the image to deconstruct it into a plurality of primitive images each reflecting a palette feature of the original image and then to tabulate the populations of distinct geometrical features of each such primitive image.
The invention further extends to an object recognition system in which an image is formed of the object to be recognised, an algorithm is applied to the object image to deconstruct the object image into a plurality of primitive object images each reflecting a palette feature of the original object image and then tabulate the populations of distinct geometrical features of each such primitive object image to form a query table, and then comparing that query table with image tables created by applying said algorithm to images held in an image database.
The invention includes a method of creating an image content table which comprises: defining palette features of an image and deconstructing the image into a plurality of binary primitive images each reflecting a said palette feature;
analysing geometrical features of each such primitive image; and tabulating populations of different geometrical features of each such primitive image.
The invention also includes an image content table which comprises a tabulated list of populations of different geometrical features of each of a plurality of primitive binary images .
Preferably the primitive images are binary images having a plurality of pixels each having one of two values associated with it.
Preferably each pixel in the binary image is associated with a respective part of the deconstructed image from which it is formed. Each said part of the deconstructed image most conveniently comprises a pixel of the deconstructed image.
The primitive images may each indicate which of said parts have the respective palette feature and which do not.
Preferably there are defined a plurality of classifications of at least one image feature of an image, and each palette feature corresponds to a respective one of the classifications .
Preferably each palette feature has a palette feature vector associated with it and each pixel of the image is classified by defining a pixel feature vector for the pixel and determining which of the palette feature vectors it is closest to.
The at least one image feature may include surface texture, or colour.
Preferably each image feature has a plurality of possible categories, and each defined palette feature corresponds to a plurality of said categories.
Preferably geometric features of each primitive image are recorded in an array, for example a one-dimensional array, and the arrays for the primitive images are combined to form a table.
Preferably the geometric properties of the primitive images include geometric properties of connected elements within the primitive images.
The geometrical features tabulated may be of various kinds. Generally, the tabulation is based on an analysis of the rows and columns of pixels making up the primitive binary image, though other analytical systems are possible, for example by referring to blocks of pixels, for example connected blocks of pixels .
In some embodiments of the invention, the tabulation comprises a table of the populations of rows and columns of pixels of each palette feature of a given length or height. For example the table may reflect the presence of Rj rows made up of one pixel, R2 rows made up of two pixels and so on, and of Ci columns one pixel high, C2 columns which are two pixels high, and son on. There would be a table for each palette feature selected for analysis. The geometrical features may thus be considered as rows and columns of a given pixel length or height.
In other embodiments of the invention, the tabulation comprises the total number of pixels containing the selected palette feature in each row and column. The geometrical feature is thus simply the active pixel content of each row or column.
In yet other embodiments of the invention, the geometrical feature maps to the total size of each continuous array of pixels of a given palette feature.
The palette feature analysed may be a colour tone, a pattern or texture, or any other tonal value contributing to the appearance of the image which is being subjected to the algorithm. In particularly preferred embodiments of the invention, at least one said palette feature represents a collocation of a plurality of visually distinguishable tonal values of the respective image. Thus the algorithm may be adapted to reduce the colour palette prior to tabulation. For example the image may be reduced to a colour palette of 128 colours, or of 64 or 32 or even fewer colours. This of course greatly reduces the memory space required for storing the tabular information required if it is desired to maintain an entire database in tabular form.
The algorithm may be applied in such a way that, for example, all blue tones are combined to a single blue palette feature and the image is tabulated on that basis. The invention is not of course confined to the amalgamation of like colours into a single palette feature. But the algorithm applied to the stored images and the query image must be compatible.
The present invention takes the previously known techniques a step further by providing a method of representing image contents based on the co-occurrence of geometric features and surface appearance features and summarising the co-occurrence statistics in a tabular form. A number of innovative features of the current invention make this invention advantageous over prior art technologies. The use of the co-occurrence of the surface appearance features (such as colour tone, texture characteristics etc) and the geometric structure (shape, size, and other parameters) of image areas of homogeneous appearance makes the representation more discriminative, thus enabling more accurate image retrieval. The decomposition of the image into binary primitive form also allows fast and flexible processing. The present invention is not only
applicable to image indexing and retrieval but also has more general application in general image matching and object recognition.
Preferred embodiments of the invention will now be described in greater detail with reference to the accompanying diagrammatic drawings in which:
Figure 1 is a general schematic diagram illustrating image content table generation;
Figure 2 illustrates a palette feature table relating to different colours ; Figure 3 illustrates a palette feature table relating to different texture features;
Figure 4 illustrates a classifier;
Figure 5 shows a binary image used to illustrate the construction of horizontal and vertical pixel arrays; Figure 6 shows an 8-connected pixel array and a 4 connected pixel array;
Figure 7 illustrates connected component labelling;
Figure 8 illustrates how a primitive binary image may be divided into horizontal and vertical patches; Figure 9 is an example illustrating the calculation of vertical and horizontal projections;
Figure 10 is an image content table (ICT) based on Horizontal and Vertical Line Measurement
Figure 11 is an image content table (ICT) based on Connected Component Measurement
Figure 12 is an image content table (ICT) based on Horizontal and Vertical Projections;
Figure 13 illustrates an application of Image Content Table to Image Database Indexing and Retrieval;
Figure 14 illustrates the construction of object class prototypes/models and a recognition model; and Figure 15 illustrates steps involved in the recognition of an unknown object.
Image Content Table
The image analysis method of the invention uses a novel concept of image content table (ICT) . Fig. 1 shows the schematic of a method for generating an ICT.
The input image is first passed through a classifier, which classifies the input image pixels into a number of classes (N) according to the palette features (PFs) which are to be analysed. These palette features result in N feature vectors characterising image tonal value properties such as colour tones or surface pattern or texture properties. It is not essential that each individual visually or otherwise distinguishable colour or other tonal value should be assigned to a palette feature giving rise to a feature vector. Indeed, it is preferred that the various colour tones of the input image should be assigned to a reduced colour palette in which a single palette feature represents a plurality of tonal values of the input image. Thus the various colour tones of the input image are preferably assigned to one of, for example, 128 different palette features. In a simpler system requiring even less memory, the palette used for analysing the input image contains 64 or 32 or 16 or 8 colours or features.
According to the result of the classifier, N primitive binary images (PBI) are formed, each of the same dimensions as the input image. Each PBI corresponds to a palette feature (vector), and the pixel values of the PBI
indicate the presence or absence of the palette feature in the input image at the pixel locations. Generally, a pixel value of 1 in the PBI indicates the presence of the feature and a pixel value of 0 indicates the absence of that feature. (However, since it is a binary image, assigning the pixel values the other way round is also possible) . The primitive binary images are then analysed by the binary image analyser, whose primary function is to characterise the geometric features of the 1 -valued pixels in the PBI's. A two-dimensional table, the image content table (ICT) , which characterises the frequencies of the co-occurrence of the palette features and the geometric features is then generated.
These concepts, the palette feature vector, the classifier, the primitive binary images, the binary image analyser and the image content table generator will now be described in detail.
Palette Feature Vectors The palette features are contained in a pre-determined table consisting of N entries, each pointing to an image feature vector. There are various properties of the input image which may be considered as image features : these include (1) colour, and (2) surface texture properties. Other features are possible. Schematically, the palette feature table for the case where colour tones are used as image features is shown in Fig. 2. As can be seen, each palette feature is characterised by a 3-dimensional vector in an appropriate colour space, and for illustration purposes, an RGB colour space is used here. It is to be noted that the use of other colour spaces is also possible. Instead of, or in addition to the use of colour tone as palette feature, we can also use palette features that characterise the surface texture or pattern or other properties of the image. In this case, there are also many methods for generating the characteristic palette feature vector. For example, one particular type of method (see T. Randen and J. H. Husøy,
"Filtering for texture classification: A comparative Study" , IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 21 , no. 4, pp. 291- 310, April 1999) involves the use of image filtering where a bank of filters is applied to the image and the texture property at each pixel location is characterised by a texture feature vector formed from the output of the filter bank. Schematically, the palette table for the case when texture properties are used as image features is shown in Fig.3, where it is assumed the texture property of a pixel location is characterised by an m-dimensional texture feature vector, obtained for example by a filtering based approach. The texture features can be used to describe the texture surface of a grey-scale as well as colour images (see A. Jain and G. Healey, "A multiscale representation including opponent color features for texture recognition" , IEEE Transactions on Image Processing, vol. 7, no. 1, pp. 124 - 128, January 1998) .
Generally, there will be only one palette feature table in the system, either based on colour or based on texture. A combination of the two palettes is also possible. In any case, the N feature vectors should be chosen as statistically representative vectors in the chosen feature space. There are many well-established methods can be used, almost all based on some forms of vector quantization [Reference 4] and use as many example images as possible to generate training samples. The process of obtaining the palette feature vectors needs only be performed once, and after feature vectors are generated they are fixed and stored for used by the classifier. (Generating the palette feature table can be easily done by those known in the art e.g. Gervautz, M. & Purgathofer, W. "A Simple Method for Color Quantization: Octree Quantization" , Graphics Gems, Academic Press, 1990, and A. Gersho, R. M. Gray, Vector quantization and signal compression, Kluwer Academic Publishers, Boston, 1992). Classifier
There are well-established methods of building the classifier. As an illustration, a minimum distance classifier based on Euclidean distance measure is described here. The schematic of the classifier is illustrated in Fig. 4. Each pixel, P, in the input image is described by its pixel feature vector (PFV) . In the case where colours are used as palette features, this PFV is simply the 3 -dimensional colour vector (e.g., R, G, and B) . Let X denote the PFV, and X = (rx, gx, bx) . In the case where surface texture features are used as palette features; this PFV is the m-dimensional texture characteristic feature vector. Again, using X to denote the PFV, X = (T.(l), Tx(2) , ...,Tx(m)) . Each pixel's PFV in the original input image is compared with the N palette feature vectors in the Euclidean space. The classifier outputs the index of the palette feature vector that is the closest to the PFV, X.
Formally, the classification process illustrated in Fig. 4 can be described as follows:
A) For each pixel P in the input image, compute the PFV, X
B) Calculate the Euclidean distances between X and the N palette feature vectors, PFj (i = 1, 2, ..., N) according to: d(x,i) = || -PE,.|| , for ali i = 1, 2, ..., N
In the case where colours are used as palette features, the distances may be calculated as d(x,i) = ^(rx - r.f + (gx - gif + (bx - b , for ali i = 1, 2, ..., N where (ri; g;, b;) is the i-th colour palette feature vector in the palette feature table.
Similarly, in the case texture features are used as the palette feature, the distances are calculated as
d(x,i) = l∑(Tx(j) -Ti(j))2 for all i = 1, 2, N
where(T(l), T(2) , ...,Ti(m)) is the i-th texture palette feature vector in the palette feature table.
C) Find the minimum distance, and output its index as the pixel feature index PFI, i.e.
PFI (P) = k, if d(x, k) < d(x, i), for all i = 1, 2, ... N and i ≠ k
Creating Primitive Binary Images
After classification, each pixel in the input image will have been assigned an index, the pixel feature index (PFI), ranging from 1 to N, indicating the assignment of the pixel into one of the N palette features. From the association of the pixels and the palette features, N binary images, each of the same dimensions as the input image, are constructed. Let P(x, y) , x = 1, 2, ..., L; y = 1, 2, ..., M; be the pixels of an L x M image (grey scale or colour) at location (x, y). Let PBL(x, y), i = 1, 2, ..., N; x = 1, 2, ..., L; y = 1, 2, ..., M; be the pixels of the i-th primitive binary image at location (x, y) . Since after classification, each P(x, y) will have been assigned a label associating it with a palette feature, it is a simple matter to assign the pixel values in the primitive binary images. Formally we have
That is, a pixel in the i-th primitive binary image will have a value of 1 if the pixel in the corresponding location in the input original image is associated with the i-th palette feature in the palette feature table, and a value of 0 if the pixel in the corresponding location in the input original image is assigned to any other palette feature.
Binary Image Analyser
After representing the image in a series of primitive binary images, we analyse the binary image to extract the geometric measurements of connected 1- valued pixels. There are many possibilities in measuring the geometric structure of connected components in binary image: three possible measurements are illustrated.
Horizontal and Vertical Line Measurement
In this method, we measure the length of connected 1 -valued pixels within the PBIs along the horizontal and vertical directions. A set of line length values, Li, L2, ...Lκ, is pre-set. The image is first scanned along the horizontal direction to form a 1 x K array, GHL, = {GHL(1) , GHL(2) , ... , GHL(K)} whose elements are the number of connected horizontal lines of lengths Lj, L2, ...Lκ. Then the image is scanned along the vertical direction to form another 1 x K array, GVL, = {GVL(D , GVL(2), ... , GV (K)} , whose elements are the number of connected vertical lines of lengths L^ L2, ...Lκ. Note that the number of line length values and indeed the values of the line length for horizontal and vertical direction do not have to be the same. To illustrate how GHL and GVL are formed we use the binary image as shown in Fig. 5. For illustration convenience, we set K = 4 and L! = 1 L2 = 2, L3 = 3, L4 = 4. (Other values of K and Lj, i = 1, 2, ..., K are possible.) Clearly we have GHL = {9, 1, 2, 3} and GHL= (16, 3, 0, 4} . Notice that if a connected line is of length L and L > Lκ, then it is counted as INT(L/LK) lines of length Lκ, and one line of length (L - Lκ* INT(L/LK)), where INT(x) denotes the integer part of the value x.
Connected Component Measurement
In this method, connected 1-valued pixels are given the same label. Two pixels can be 8-connected or 4-connected as illustrated in Figs. 6a and 6b respectively. In Fig. 6a, pixel 1 and pixels 2, 3, 4, 5, 6, 7, 8, and 9 are
8-connected, and in Fig. 6b, pixel 1 and pixels 2, 3, 4, 5 are 4- connected. For illustration purpose, 8-connected pixels are considered here.
There are well established algorithms to label connected pixels [Reference 5] . As an illustration, we use the binary image in Fig. 7 as an example in which all pixels in a group of connected pixels have been given the same label.
All pixels with the same label form a connected component. We measure the geometric properties of the connected components. There are many parameters of a connected component which can be used to characterise its geometric property. As an illustration, we could use the size of connected component as a measure of its geometric property. Other measures such as the shape, the second order moments, the centroid etc. can be obtained easily and used instead (M. Sonka, V. Havac and R. Boyle, Image Processing, Analysis and Machine Vision, Chapman & Hall, London, 1994) .
First, we divide the size of connected components into K categories, each denoting a range of sizes. Let S = {Si, S2, ..., Sκ} be the size category vector, where S; denotes the size of a connected component. Then, we form a 1 x K array, Gs = {Gs(l), Gs(2), ... , GS(K)} whose elements are the number of connected components whose sizes fall within a certain category. Gs(l) is the number of components whose size is less than or equal to Si. Gs(i), for i = 2, 3, ..., K-l, is the number of components whose size is greater than S; and less than or equal to Si+ 1. GS(K) is the number of components whose size are greater than Sκ.
As an illustration, we again use Fig. 7 as an example. If we set S = {Si, S2, ... , S = {1 , 2, 3, 4, 5}, it is easy to see that Gs = {Gs(l), Gs(2) , ... ,
GS(K)} = {2, 1, 0, 2, 2} . If we set S = {Slf S2, ... , Sκ} = {2, 4, 6, 8}, it is easy to see that Gs = {Gs(l) , Gs(2) , ... , GS(K)} = {3, 2, 0, 1, 1} .
Projection Measurements
Another method of analysing the primitive binary images is based on the projection of the 1-valued pixels along horizontal and vertical directions. Assuming the original input image consists of M (row) x N (column) pixels, we divide the binary image into horizontal and vertical patches, each consists of equal number of rows or columns of pixels, as illustrated in Fig. 8. Assuming we are dividing the image into V (1 < V < N) vertical and H (1 < H < M) horizontal patches, then each vertical patch will consists of INT(N/V) columns of pixels, and each horizontal patch will consist of INT(M/H) rows of pixels. Notice INT(x) denotes the integer part of x. In the cases where M ≠ iH and N ≠ jV, where i and j are some integer number, then the image is expanded by duplicating the last row and last column of pixels as many time as necessary to satisfy the condition M = iH and N = jV. (This technique can easily be performed by those skilled in the art) .
To describe the geometric property of the primitive binary image, two 1- dimensional arrays are constructed: a 1 x V vertical projection array, GPV = {Gpv(l) , GpV(2) , ... , Gpv(V)}, whose elements are the numbers of 1- valued pixels in each of the V vertical patches; and a 1 x H horizontal projection array, GPH = {GPH(1), GPH(2), ..., GPH(H)}, whose elements are the numbers of 1-valued pixels in each of the H horizontal patches. Let PBI(x, y), x = 1, 2, ... M, y = 1, 2, ...N be a binary image divided into V vertical and H horizontal patches, then the elements of the horizontal and vertical projection arrays can be calculated as follows:
GPH(f)= ∑ ∑(PBl(x,y)) y=\ x=l+j*INT(M/H) for j = 1, 2, H
Fig. 9 shows an example calculating the vertical and horizontal projection. The original image is 12 rows x 13 columns (pixel) . We want to divide it into 6 horizontal patches and 7 vertical patches. Since 13/7 = 1.86 is not an integer, we duplicate the last column of pixels to make the image 12 rows x 14 columns (pixel) . Therefore, each vertical patch consists of two columns of pixels, and each horizontal patch consists of 2 rows of pixels. It can be easily worked out that GPV = {Gpv(D , Gpv(2) , ..., GPV(7)> = {8, 4, 4, 7, 2, 3, 18}, and GP„ = {GP„(1) , Gp„(2), ..., GPli(6)} = {13, 6, 11, 2, 3, 2}.
Image Content Table Generator
After obtaining the geometric property measurements on each of the N primitive binary images, an image content table (ICT) is generated. The ICT is a two dimensional array formed by arranging the 1 -dimensional geometric measurement arrays produced by the image analyser. For different implementations (embodiments) of the image analyser, the table will be different and the construction of the ICT for each of the three measurements described above is illustrated in Figures 10 to 12 respectively.
ICT based on Horizontal and Vertical Line Measurement (Fig. 10)
Let G„L(i) = {GHL(i, 1) , GHL(i, 2) , ..., G„L(i, K)} and GVL(i) = {GVL(i, 1), GVL(i, 2), ... , GVL(i, K)} be horizontal and vertical line measurement arrays for the i-th primitive binary image, where i = 1, 2, ..., N, generated by the image analyser based on the method of horizontal and vertical line measurement. The image content table is formed by stacking GHL(i) and GVL(i) to form the i-th column of the 2-dimensional ICT as shown in Fig. 10.
ICT based on Connected Component Measurement (Fig. 11)
Let Gs(i) = {Gs(i, 1), Gs(i, 2) , ..., Gs(i, K)} be the geometric property measurement array generated by the image analyser based on the method of Connected Component Measurement for the i-th primitive binary image. The image content table, ICT, is formed by arranging Gs(i) into the i-th column of the ICT as shown in Fig. 11.
ICT based on Projection Measurement (Fig. 12)
Let Gpy(i) = {GpV(i, 1) , GPV(i, 2), ... , GPV(i, V)} and GPH(i) = {GP„(i, 1), GPH(i, 2), ... , GPH(i, H)} be the vertical and horizontal projection arrays generated by the image analyser based on the projection measurement method for the i-th primitive binary image. The image content table is formed by stacking GPV (i) and GPH (i) to form the i-th column of the 2- dimensional ICT as shown in Fig. 12.
Image Content Table for Image Database Applications A principal application of the image content table is in image database management. It can be used in content-based image indexing and retrieval. The scenario can be illustrated in Fig. 13.
Construction of Image Database based on Image Content Table
Assuming that we have a large collection of digital image consists of Z images. For each image, an image content table (ICT) is constructed and stored on the disk as part of the image database.
Image Database Query based on Pictorial Example
Such a database constructed using our ICT can be used to implement a content-based image query, or a query by pictorial example. Suppose that a user has an image and wants to find from the database images that are similar to the query image. An image content table, ICTQ, is first constructed for the query image. Then this ICTQ is compared to the
ICTs of the database. A similarity scorer calculates a metric measurement between the query image's ICT and the ICTs of the database images. One possible implementation of the Similarity Scorer is as follows.
Let ICTQ = {X(i, j) : i = 1, 2, ..., K, j = 1, 2, ...N} be the image content table of the query image, and ICTD = {Y(i, j) : i = 1 , 2, ..., K, j = 1, 2, ...N} be the image content table of a database image. The similarity score between the two images can be calculated:
Normalise the ICTs as follows:
NX(iJ) = κ ) NY(i,j) = K Y}i )
∑∑X(i ) ∑∑Yi J)
Calculate the score S as follows:
In this way, each image in the database can be given a similarity score. All the scores may be sorted in an increasing order (the smaller the score, the more similar the images) and returned to the user. From this sorted list, the user can identify the images from the database that are most similar to the query.
Application of Image Content Table to General Objection Recognition
The image content table has more general applications in machine vision based automatic object recognition. In this case, an object is first imaged using a sensor, e.g. , a CCD camera. The image of the object is deconstructed; e.g. , using a pre-designed colour palette table, into a set of primitive binary images and an image content table is constructed to
represent the image (object) . Using this representation, general automatic object recognition machine systems can be constructed.
Fig. 14 illustrates how the ICT can be used in constructing object class prototype/model and recognition models. Each sample object of known class identity is imaged and represented using an ICT, which can be regarded as a feature vector (pattern) . Then well known pattern recognition techniques can be used to construct object class prototypes/models using an appropriate recognition strategy.
Fig. 15 illustrates the use of the object class prototype/model ICT to recognise unknown objects. An unknown object is first imaged and then the image of the unknown object is represented in image content table form and presented to the same pattern recognition algorithms which will use the object prototype/model information to identify the class identity of the unknown object automatically.
Other Imaging Applications
Other applications can also be envisaged involving the use of a representation that uses a feature palette to decompose the input image into primitive binary image and measures the geometric properties of the primitive binary images, and tabulates the frequencies of the co- occurrence of the geometric measures and the palette features .