WO2009026965A1

WO2009026965A1 - Image classification system and method

Info

Publication number: WO2009026965A1
Application number: PCT/EP2007/059088
Authority: WO
Inventors: Antony Louis Piriyakumar Douglas; Ashish Sharma
Original assignee: Siemens Building Technologies Fire & Security Products Gmbh & Co.Ohg
Priority date: 2007-08-31
Filing date: 2007-08-31
Publication date: 2009-03-05

Abstract

The present invention relates to classifying digital images using support vector machines. In order to improve image classification using support vector machines the proposed system comprises a support vector machine (14) which includes means (20) for supplying a plurality of classified digital images (22) to the support vector machine (14), means (24) for picking pixel values (18) from the image, from said plurality of images, in a pre-specified pattern, means (26) for forming a feature vector (28) from the picked pixel values (18), means (30) for classifying the feature vector (28) according to the respective classified image (22), means (32) for obtaining support vectors (34) from the feature vectors (28), the support vectors (34) defining a hyperplane condition for classifying the digital images, means (36) for classifying the feature vector (28) by testing it on the hyperplane condition and means (38) for classifying the image depending on the feature vector classification.

Description

Image classification system and method

The present invention relates to classifying digital images using support vector machines.

Classification of images based on the presence or absence of a particular object in them is used in surveillance, secure access and human/computer interface systems. Object detection systems, for example Support vector machines usually detect particular features of objects.

Support vector machines (henceforth referred as SVMs) are a class of learning algorithms for classification that are particularly useful for higher dimensional input data with either large or small training sets. SVMs suitable for object detection problems work by mapping the input features of the images to the SVM into a higher dimensional feature space and computing functions on those mapped features in the high- dimensional feature space. During training, a SVM creates feature vectors from a set of already classified training images. These feature vectors are used to form a set of support vectors which are obtained from feature vectors to define a condition to classify the images. The condition is usually called hyperplane condition and is defined by the support vector set. For classification of images, also called testing, a feature vector is formed from an image and the feature vector is tested on the hyperplane condition which thereby classifies the image. Presently, the SVMs use computationally intensive feature vector extraction methods and complex algorithms which increases the memory space requirements and execution time for the SVMs.

It is an object of the present invention to improve image classification using support vector machines. The above object is achieved by a method for training a support vector machine for classifying digital images, each image including a plurality of pixels, the training comprising the steps of: -supplying a plurality of classified images to the support vector machine,

-picking pixel values from an image, from said plurality of classified images, in a pre-specified pattern,

-forming a feature vector from the picked pixel values, -classifying the feature vector according to the respective classified image,

-repeating the above three steps for at least part of the plurality of images, and

-obtaining support vectors from the feature vectors using the support vector machine, the support vectors defining a hyperplane condition for classifying the images.

The above object is achieved by a method for classifying a digital image, the image including a plurality of pixels, comprising steps of:

-supplying the digital image to a trained support vector machine, wherein the trained support vector machine includes support vectors defining a hyperplane condition for classifying the images, -picking pixel values from the image in a pre-specified pattern,

-forming a feature vector from the picked pixel values, -classifying the feature vector by testing it on the hyperplane condition, and -classifying the image depending on the feature vector classification.

The above object is achieved by a system for classifying digital images, each image including a plurality of pixels, comprising a support vector machine, including:

-means for supplying a plurality of classified digital images to the support vector machine; -means for picking pixel values from the image, from said plurality of images, in a pre-specified pattern; -means for forming a feature vector from the picked pixel values; -means for classifying the feature vector according to the respective classified image;

-means for obtaining support vectors from the feature vectors, the support vectors defining a hyperplane condition for classifying the digital images. -means for classifying the feature vector by testing it on the hyperplane condition; and

-means for classifying the image depending on the feature vector classification.

The underlying idea of the present invention is to use part of the information from the image, by using pixel values from selected pixels instead of all the pixels of the image, to form a feature vector for image classification. The pixels are selected in a pre-specified pattern such that the whole image is broadly covered and the information extracted is sufficient for further processing in SVM for image classification. This reduces the size of feature vector which thereby reduces the storage memory required and the time to process the information.

In a preferred embodiment of the present invention the support vectors defining the hyperplane condition are obtained from a plurality of feature vectors representing a plurality of classified images, the feature vectors formed by picking pixel values from the images in the pre-specified pattern. The pre-specified pattern used for classifying images and for training the support vector machine is the same. This reduces the memory space requirements and computation time while training the support vector machine and also while classifying the image using it. In a further preferred embodiment of the present invention, the digital images are classified depending on the presence or absence of a particular object in the digital image. The method is advantageous in surveillance and security systems to detect the presence or absence of any object in the area under observation.

In a further preferred embodiment of the present invention the pixel values include luminance values of the respective pixels. As in most cases the luminance values selected in a specified pattern are alone sufficient to detect the shape of an object in an image. Using only these values reduces the memory space required to store and process the information collected from every pixel.

In a further preferred embodiment of the present invention the support vectors obtained from the feature vectors using the support vector machine are combined into at least one support vector defining the hyperplane condition for classifying the digital images using a linear kernel. This decreases the number of support vectors and thereby reduces the memory space and the computational time required for classifying an image. The use of a linear kernel instead of complex kernels further reduces the computation time to classify the images.

In a further preferred embodiment of the present invention the support vectors obtained from the feature vectors using the support vector machine further comprise numerical coefficients, such that each numerical coefficient is associated with a support vector. These numerical coefficients can be weighted coefficients associated with corresponding support vectors and can be used to combine the support vectors into a single support vector using the linear kernel.

In a further preferred embodiment of the present invention the pre-specified pattern is a regular pattern. The regular pattern is such that it takes values from a lesser number of pixels while still broadly covering the whole image. This reduces the amount of information required to form the feature vector and thereby reduces the memory requirements without losing the accuracy of the method.

The present invention is further described hereinafter with reference to preferred embodiments shown in the accompanying drawings, in which:

FIG 1 is a schematic overview of a system for classifying images using a support vector machine

FIG 2 is a diagram showing a digital image and pixels

FIG 3 is schematic illustration of a system for classifying digital images using a support vector machine, according to a particular embodiment of the present invention.

FIG 4 is a flowchart illustrating a method for training a support vector machine for classifying digital images, according to a particular embodiment of the present invention and

FIG 5 is a flowchart illustrating a method for using the trained support vector machine for classifying digital images, according to a particular embodiment of the present invention .

Referring to FIG 1, an overview of a system 10 for classifying digital images 12 using a support vector machine 14 is illustrated according to one embodiment of the present invention, wherein a digital image 12 is supplied to the support vector machine 14 as input and the SVM classifies the image. The classification is done based on the presence or absence of a particular object in the image. Before describing the proposed system and method to classify an image, some of the terminology used herein will be explained.

An analog or continuous parameter image such as a still photograph may be represented as a matrix of digital values (also called pixel values) , and stored in a storage device, such as that of a computer, an embedded system, or other digital processing device. Thus, as described herein, the matrix of pixel values is generally referred to as a "digital image" or more simply an "image" and may be stored in a digital data storage device, such as a memory, as an array of numbers representing the spatial distribution of energy at different wavelengths in a scene.

Similarly, an image sequence, for example a view of a moving pedestrian, may be converted to a digital video signal as is generally known. The digital video signal is provided from a sequence of discrete digital images or frames. Each frame or image may be represented as a matrix of digital data values which may be stored in a storage device, such as that of a computer, an embedded system or other digital processing device. Thus in the case of video signals, as described herein, a matrix of digital data values are generally referred to as an "image frame" or more simply an "image" or a "frame". Similarly a live image window contains an exposure frame that outlines the area to be captured in the final digital image. The size of this frame corresponds to the selected image size (chosen in the Photo panel) . When an image size is chosen that is smaller than full screen size, only the portion of the screen bounded by the exposure frame is included in the image. The exposure frame can be translated within the boundaries of the live image window by clicking anywhere within the frame and dragging with the mouse. In this manner, users can select a specific portion of the window.

The center of the live image window also contains a digital Focus Mark that denotes the area selected for focus. Henceforth window, frame or digital image will be referred to as ^vimage' in the present application conveying the same meaning .

Whether provided from a still photograph or a video sequence, each of the numbers in the array corresponds to a digital word (e.g. an eight-bit binary value) typically referred to as a "pixel value" or as "image data". Thus the image is represented by a two dimensional array of pixels with each of the pixels represented by a pixel value comprising a digital word.

FIG 2 shows a digital image 12 comprising a plurality of pixels 16 arranged in a matrix form. The pixels 16 have luminance and chrominance values combmely called as pixel values associated with the corresponding pixel 16. The said image 12 can be a portion of a further digital image for example a window or a frame of a video film or it can be an analog image converted into digital format before supplying to the SVM.

The SVM 14 has to be trained before using it to classify images. The SVM is trained by supplying a plurality of classified images and establishing a criterion for classification of images. Classified images 12 are also called label images and will convey the same meaning throughout the application. The classified images are pre- classified depending on the presence or absence of any particular object in them.

Referring to FIG 3, the system 10 comprises a support vector machine 14 which comprises means 20 for supplying classified images 22 to the support vector machine 14, means 24 for picking pixel values 18 from the a digital image 12 in a pre- specified pattern, means 26 for forming a feature vector 28 from the picked pixel values 18, means 30 for classifying the feature vector 28 according to the respective classified 22 image, means 32 for obtaining support vectors 34 from the feature vectors 28, the support vectors 34 defining a hyperplane condition for classifying the digital images, means 36 for classifying the feature vector 28 by testing it on the hyperplane condition and means 38 for classifying the image 22 depending on the feature vector classification. Referring to FIG 4, the method 50 of training the support vector machine 14 begins with step 52 in which an image from the set of classified images 22 is supplied to the SVM 14. The set comprises known images which are pre-classified based on the presence or absence of a particular object in the image. In the present embodiment of the invention the plurality of images 22 are supplied one by one to the SVM. At step 54 pixel values 18 are picked from the supplied classified image 22 in a pre-specified pattern. In the present embodiment of the invention the pre-specified pattern is a regular pattern. The picking of pixel values 18 is henceforth also referred to as selection of pixels. In the present embodiment of the invention, filters are applied to the image. A filter is a mask of rectangular size which is applied over the image and the filtered value which is the result of applying filter is the output of a mathematical operation called 'Convolution' . Two paramters (filter shift width, shift width) for selecting the next pixel on the same row and two parameters (filter shift height, shiftheight ) for selecting the next row are used in the selection of pixels. For example:

Let P(i,j) be the current pixel. The next pixel on the same row will be P(i,j+δj)

Where δj = (filter shift width) / (shift width)

After completing the row, the next row is selected as P(i+ δi,0)

Where δi = (filter shift height) / (shift height) and 0 is the first column.

In most cases the luminance values selected in a specified pattern are alone sufficient to detect the shape of an object in an image. The present embodiment of the invention uses only luminance values of the respective pixels as the pixel values 18.

In step 56 a feature vector 28 is formed from the picked pixel values 18 and it represents the corresponding image for further steps in the SVM.

At step 58 the feature vector 28 is classified depending on the classification of its corresponding classified image 22. As a pre-classified image 22 is given as input and the feature vector 28 is obtained from that image, the classification of the feature vector 28 is the same as that of the image 22. The classified feature vector is stored in the memory of the SVM. The steps from step 52 to step 58 are repeated for all the classified images 22. At step 60 it is checked whether the plurality of input classified images 22 are converted into their corresponding feature vectors 28 or not. If all the images are converted then next step is carried out otherwise the next image from the plurality of classified images 22 is given as input till all the classified images have been converted. Thus the plurality of classified images 22 is turned into a set of classified feature vectors representing the corresponding image.

In step 62 a set of support vectors 34 and numerical coefficients, such that each numerical coefficient is associated with a support vector 34, is obtained from the classified feature vectors to form the criterion for classifying an image. The support vectors 34 define a condition called hyperplane condition which is used to classify an image. In the present embodiment of the invention the support vectors 34 are combined into one support vector by using a linear kernel. A linear kernel is a method for converting a non-linear classifier algorithm into a linear one by using a linear function to map the original observations into a single-dimensional space. The linear classification in the new space is equivalent to non-linear classification in the original space. The support vectors are combined into a single support vector using a linear kernel as below:

Let V₁ be one of the support vectors,

Of₁ be the numerical coefficient associated with V₁, f be a feature vector, and

S₁ be the single support vector after combining using linear kernel. Then the single support vector is linear sum calculated as follows:

where S₁ = ^ [class of V₁] * [αj * [dot product of (V_1Λf)] (class of V₁) = +1 for the presence of object in image, = -1 for absence of object in image.

The single support vector S₁ so obtained forms the criterion of image classification by defining the hyperplane condition. Step 62 completes the training of the SVM and after that the SVM is ready for classifying any image by using the hyperplane condition.

Referring to FIG 5, the method 70 for classifying images start with step 72 of supplying a digital image 12 to the trained support vector machine 14 which already has a hyperplane condition defined by support vectors 34 for image classification. In next step 74 pixel values 18 are picked from the image in pre-specified pattern. In the present embodiment of the invention the pre-specified pattern is similar to the pre-specified pattern used while training the SVM. Also the pixels values mean luminance values at corresponding pixels.

At step 76 a feature vector 28 is formed from the picked pixel values 18. The feature vector 28 represents the image for further processing in the SVM. In step 78 the feature vector 28 is tested on the hyperplane condition and the image is classified depending on the feature vector classification.

Summarizing, the present invention relates to classifying digital images using support vector machines. In order to improve image classification using support vector machines the proposed system comprises a support vector machine 14 which includes means 20 for supplying a plurality of classified digital images 22 to the support vector machine 14, means 24 for picking pixel values 18 from the image, from said plurality of images, in a pre-specified pattern, means 26 for forming a feature vector 28 from the picked pixel values 18, means 30 for classifying the feature vector 28 according to the respective classified image 22, means 32 for obtaining support vectors 34 from the feature vectors 28, the support vectors 34 defining a hyperplane condition for classifying the digital images, means 36 for classifying the feature vector 28 by testing it on the hyperplane condition and means 38 for classifying the image depending on the feature vector classification.

Although the invention has been described with reference to specific embodiments, this description is not meant to be construed in a limiting sense. Various modifications of the disclosed embodiments, as well as alternate embodiments of the invention, will become apparent to persons skilled in the art upon reference to the description of the invention. It is therefore contemplated that such modifications can be made without departing from the spirit or scope of the present invention as defined.

Claims

Patent claims

1. A method for training a support vector machine (14) for classifying digital images (12), each image including a plurality of pixels (16), the training comprising the steps of: a) supplying a plurality of classified images (22) to the support vector machine (14); b) picking pixel values (18) from an image, from said plurality of classified images (22), in a pre-specified pattern; c) forming a feature vector (28) from the picked pixel values (18) ; d) classifying the feature vector (28) according to the respective classified image (22); e) repeating steps b,c, and d for at least part of the plurality of images; and f) obtaining support vectors (34) from the feature vectors (28) using the support vector machine, the support vectors (34) defining a hyperplane condition for classifying the images .

2. The method according to claim 1, wherein the digital images (12) are classified depending on the presence or absence of a particular object in the digital image.

3. The method according to any of the preceding claims, wherein the pixel values (18) include luminance values of the respective pixels (16) .

4. The method according to any of the preceding claims, wherein the support vectors (34) obtained from the feature vectors (28) using the support vector machine (14) are combined into at least one support vector defining the hyperplane condition for classifying the digital images using a linear kernel.

5. The method according to any of the preceding claims, wherein the support vectors (34) obtained from the feature vectors (28) using the support vector machine (14) further comprise numerical coefficients, such that each numerical coefficient is associated with a support vector.

6. The method according to any of the preceding claims, wherein the pre-specified pattern is a regular pattern.

7. A method for classifying a digital image, the image (12) including a plurality of pixels (16), comprising steps of: supplying the digital image (12) to a trained support vector machine (14), wherein the trained support vector machine (14) includes support vectors (34) defining a hyperplane condition for classifying the images; picking pixel values (18) from the image (12) in a pre- specified pattern; forming a feature vector (28) from the picked pixel values

(18) ; classifying the feature vector (28) by testing it on the hyperplane condition; and classifying the image depending on the feature vector classification.

8. The method according to claim 7, wherein the support vectors (34) defining the hyperplane condition are obtained from a plurality of feature vectors (28) representing a plurality of classified images (22), the feature vectors (28) formed by picking pixel values (18) from the images in the pre-specified pattern.

9. The method according to claim 7 or 8, wherein the pixel values (18) include luminance values of the respective pixels (16) .

10. The method according to claims 7 to 9, wherein the hyperplane condition classifies the digital image (12) depending on the presence or absence of a particular object in the digital image (12) .

11. The method according to any of the claims 7 to 10, wherein the digital image (12) is a portion of a further digital image.

12. The method according to any of the claims 7 to 11, wherein the pre-specified pattern is a regular pattern.

13. A system (10) for classifying digital images, each image (12) including a plurality of pixels (16), comprising a support vector machine (14), including: a) means (20) for supplying a plurality of classified digital images (22) to the support vector machine (14) ; b) means (24) for picking pixel values (18) from the image, from said plurality of images, in a pre-specified pattern; c) means (26) for forming a feature vector (28) from the picked pixel values (16); d) means (30) for classifying the feature vector (28) according to the respective classified image (22); e) means (32) for obtaining support vectors (34) from the feature vectors (28) , the support vectors (34) defining a hyperplane condition for classifying the digital images. f) means (36) for classifying the feature vector (28) by testing it on the hyperplane condition; and g) means (38) for classifying the image depending on the feature vector classification.