CN1945629A

CN1945629A - Method for calculating image object size constance

Info

Publication number: CN1945629A
Application number: CN 200610113910
Authority: CN
Inventors: 须德; 吴爱民; 郎丛妍; 李兵
Original assignee: Beijing Jiaotong University
Current assignee: Beijing Jiaotong University
Priority date: 2006-10-20
Filing date: 2006-10-20
Publication date: 2007-04-11

Abstract

The invention discloses a calculation method for size constancy of an image object, which belongs to the technical fields of computer vision, image understanding and pattern recognition. Perceptual constancy is the most important and salient aspect of human perception of the world. Size constancy is one of the most important perceptual constancy. The invention enables the computer to realize the size constancy perception of each object in a single two-dimensional image like a human being, because the method completely simulates the size constancy mechanism of the human visual system. Its main steps include: calculate the middle line of the image using sky detection technology; calculate the straight line parameters in the direction of the fastest depth change from the bottom edge of the image to the middle line on the ground part of the image; calculate the perceived depth at the midpoint of each image object; calculate The constant size of each image object. The invention is particularly helpful to solve the difficult problem of invariant viewpoint in object recognition, and can be used for image object recognition.

Description

A kind of method for calculating image object size constance

Technical field

The present invention relates to a kind of method for calculating image object size constance, belong to the technical field of computer vision, image understanding and pattern-recognition.

Background technology

According to geometrical optics knowledge, object is different from the profile of object at amphiblestroid reflection haze, can be along with human and environment constantly change, and almost all changing all the time.But extraneous object looks it all is the same, and shape, size, color, lightness and the position relation of standard arranged.For example, along with the relative motion of observer and desk or the variation of illumination, very big variation has taken place in the retina of desk reflection, but we are to its perception not variation basically.This phenomenon is called perceptual constancy.Studies show that of psychology of vision: although the size of object retina reflection looks that becoming its size is constant substantially, this phenomenon is called as size constancy.

Perceptual constancy is most important, the most outstanding aspect, the human perception world, it makes the human visual system can surmount incomplete, that be easy to distortion, fuzzy, two-dimentional retina reflection, and set up abundant, stable, common correct, three-dimensional objective world presentation, identification has the meaning of particular importance to the shape constancy theory to image object.Because variation along with the imaging viewpoint, arbitrary object in the objective world can produce unlimited a plurality of two dimensional image projection, so from two dimensional image, identifying corresponding objective world object is the mathematical problem of one-to-many, also is one of difficult problem in the computer vision.The attractive spot of shape constancy theory is: in the face of the continually varying stimulation characteristic, object can be stabilized, perception uniquely.So the shape constancy theory helps to solve the constant difficult problem of viewpoint in the object identification especially.

Shape constancy mainly comprises following kind: size constancy, form constancy, brightness constancy and color constancy etc.Size is an important attribute of sign object.For example, in daily life, the probability that the short person is perceived as child is bigger, and it is bigger that the tall person is perceived as adult probability.And the size of correct perceptual object has important biological significance.For many carnivores, tigerkin is their possible tasty foods, and "big tiger" then is their killer.So the normal size of automatic computed image object is crucial beyond doubt for image object identification, this is the meaning calculated of image object size constance and use the place just also.

Although psychology of vision has disclosed the theory of computation of human visual system's size constancy already, but for many years, the computing machine scholar does not use this achievement and solves computer vision problem, so computing machine also just fails to obtain the ability of image object size constance perception always.The present invention proposes a kind of computing method of image object size constance, attempt to make computing machine, can realize the perception of relative size shape constancy each object in the single width two dimensional image as the people.

Summary of the invention

The objective of the invention is to be achieved through the following technical solutions, method for calculating image object size constance comprises that step is as follows:

(1) publishes picture as medium line with the sky detection technique computes;

(2) at the image above ground portion, calculate from sideline, image bottom to the fastest direction straight line of the change in depth of medium line, obtain its slope;

(3) calculate the relative perceived depth of each image object midpoint;

(4) calculate the shape constancy size of each image object.

In the step of said method (1), the colour consistency of image sky part (comprising ceiling) is better, and layout is simpler, utilizes this characteristic, goes out medium line L with the sky detection technique computes ₁, the image above ground portion is separated from entire image.In the step of said method (2), calculate the fastest direction straight line of ground change in depth L with linear perspective and two kinds of degree of depth clues of texture gradient ₂, and two kinds of methods that degree of depth clue merges have been proposed.In the step of said method (3), L ₂With L ₁Intersection point V (V _x, V _y) be called vanishing point for the point of the perceived depth maximum in the image.L ₂Intersection point U (U with sideline, bottom, image ground _x, U _y), the point for the perceived depth minimum in the image is referred to as near point U.The perceived depth minimum of near point U is made as D _U, its value equal camera from the distance of the nearest imaging point of objective world scene divided by the camera imaging coefficient B.Changing Pattern at image above ground portion each point perceived depth is: near point U to the image medium line, along the fastest direction straight line of change in depth L ₂, the image depth values linear increment reaches maximum until vanishing point V; With the fastest direction straight line of change in depth L ₂On the perpendicular straight line have the identical degree of depth (sea-bottom contour) a little.Straight line L for example ₃Cross some P (m, n) and and L ₂Vertically, L then ₃On the perceived depth of being had a few with the some P identical.So the relative perceived depth of some P can be with near point U to L ₃Distance D _U-L3Expression.So just can calculate the relative perceived depth of image ground each point.In the step of said method (4), image object perception size computing formula: S=B * A * D has been proposed.S is the perception size of object, and A is the imaging visual angle of object, and D is the perceived depth (also claiming perceived distance) of object, promptly human visual system perceives to image on object when the imaging from the distance of camera, B is the imaging coefficient relevant with eyes (camera).

Technique effect of the present invention is: this method has been simulated the realization principle of human visual system's size constancy fully.Another characteristics of the present invention are, try hard to use simple mathematical to set up complicated shape constancy computation model, and this also is to be consistent with human visual system's mechanism,

Description of drawings

Fig. 1 is the treatment scheme synoptic diagram of method for calculating image object size constance of the present invention;

Fig. 2 is that image object perceived depth of the present invention calculates synoptic diagram.

Embodiment

Below in conjunction with the drawings and specific embodiments the present invention is further described.

As shown in Figure 1, the input of method for calculating image object size constance is a single width two dimension erect(ing) image; Output be in the image each object on the one dimension dimension with assigned direction on the relative perception size of (generally being horizontal or vertical direction); Camera model is the pin-hole imaging model.Erect(ing) image is meant: the image sky be positioned at the image medium line above, image ground be positioned at the image medium line below.

According to the size constancy theory, realize each object relative size shape constancy perception in the image is needed the imaging visual angle A and relative perceived depth D of correct calculation image object.The one dimension size expression of imaging visual angle A available object in image, the promptly available pixel quantitaes that it covers along a certain direction in image.To the image object of given profile, calculate function and finish this calculation task easily.We suppose that the profile of image object all is artificial given.In calculating, the parameter of all images object is to use the function manual interaction of Ginput (n) that the MATLAB environment provides and Imcrop (I) to realize.

Remaining now work is the relative depth D of computed image object.From the relevant conclusion of psychology of vision about human visual perception degree of depth clue, we have proposed a kind of simple, effective method for solving, and it calculates principle as shown in Figure 2.At first, utilize object height and two kinds of degree of depth clues of aerial perspective in image, go out medium line L with the sky detection technique computes ₁, the image above ground portion is separated from entire image.Secondly,, utilize two kinds of degree of depth clues of linear perspective and texture gradient, can calculate the fastest direction straight line of change in depth L from sideline, image bottom to medium line at the image above ground portion ₂L ₂With L ₁Intersection point V (V _x, V _y) be the point of the perceived depth maximum in the image, i.e. vanishing point.L ₂Intersection point U (U with sideline, bottom, image ground _x, U _y), the point for the perceived depth minimum in the image is referred to as near point.Psychology studies show that to the human visual system's in certain scope, the picture depth perception is a linear change.Die near point U to the image medium line, along the fastest direction straight line of change in depth L ₂, the image depth values linear increment reaches maximum until vanishing point V.At last, the relative perceived depth figure in computed image ground.With L ₂On the perpendicular straight line have the identical degree of depth a little.As straight line L ₃Cross some P (m, n) and and L ₂Vertically, L then ₃On the perceived depth of being had a few with the some P identical.So the relative perceived depth of some P can be with near point U to L ₃Distance D _U-L3Expression.So just can calculate the relative perceived depth of image ground each point automatically, and then form dense phase perceived depth figure.

Obtained the imaging visual angle A and relative perceived depth D of each object, computing machine just can realize that image object relative size shape constancy calculates, and computing formula is as follows:

S＝B×A×D (1)

S is the perception size of object, A is the imaging visual angle of object, D is the perceived depth (also claiming perceived distance) of object, be human visual system perceives to image on object when the imaging from the distance of camera, B is the imaging coefficient (for a together imaging, B value to all objects all be identical) relevant with eyes (camera).The one dimension size of imaging visual angle A available object in image of object represented.

The computation process of relative size shape constancy is elaborated to wherein key step as shown in Figure 1 below.

1, calculates medium line L ₁

Outdoor depth image generally comprises the above ground portion of lower and the sky part of eminence simultaneously, and indoor depth image generally also comprises the ceiling portion of lower ground plate portion and eminence simultaneously.The sky part that we are referred to as outdoor images respectively is the image sky with the ceiling portion of off-the-air picture, and above ground portion is image ground with the ground plate portion, and claims that the separatrix on image sky and image ground is a medium line.Image does not have medium line sometimes yet, has only above ground portion this moment.

The colour consistency of image sky part (comprising ceiling) is better, and layout is simpler.Utilize this characteristic, use image Segmentation Technology sky can be separated.Because of tone Hue (H) component the most approaching with people's vision comparatively speaking to colored descriptive power, so earlier rgb space is converted to the HSI space.Because of pending image all is upright,,, calculate the one dimension color histogram so only the first half of image is added up so must have a day dummy section in the first half of image.Having the corresponding H value of peaked Nogata bar (Bin) is exactly the H value of sky, and note is made H _SKYIn order to improve computing velocity and to avoid the single-point of above ground portion to be mistaken for sky, image is divided into the fritter of 2*2, and its H value is the mean value of 4 pixels.If W is arbitrary image fritter, its H value is designated as H _wIf, | H _SKY-H _w|＜=T _I* H _SKY, then piece W belongs to sky.T _IBe the similarity threshold value, the experiment value is 0.05.The calculating of sky is carried out on entire image.If the area that calculates sky is less than 5% of image, we just think and do not comprise sky in this image.Being positioned in the every row of image, the sky ignore of below forms separatrix, the world.With least square method separatrix, the world is fitted to horizontal linear, this horizontal linear is exactly medium line L ₁

When not comprising sky in the image, medium line generally moves back to one of the sideline, top of image or dual-side.Because all images all are upright, medium line can not appear at the sideline, bottom of image.At this moment, the position of medium line is by vanishing point position and the fastest direction straight line of change in depth L ₂Decision.When comprising sky in the image, image ground is by medium line, sideline, bottom and the formed zone of dual-side; When not comprising sky in the image, image ground is entire image.

2. calculate the fastest direction straight line of ground change in depth L ₂

Psychologic content as can be known, two kinds of degree of depth clues of linear perspective and texture gradient can be used to indicate the fastest direction of ground change in depth.These two kinds of clues are only effective at the image above ground portion, so calculated line L ₂Image support that scope only is the image above ground portion.Utilize the linear perspective clue separately, can calculate one from sideline, image bottom to the fastest direction straight line of the change in depth of medium line, we claim that this straight line is linear perspective straight line L _PUtilize the texture gradient clue separately, also can calculate one from sideline, image bottom to the fastest direction straight line of the change in depth of medium line, we claim that this straight line is texture gradient straight line L _TL _PWith L _TComputing method introduce after a while, suppose that now these two straight lines obtain.Generally speaking, these two straight lines can not overlap, so during fast direction, can produce conflict in common indication ground change in depth inevitably.Because of these two straight lines all produce with least square fitting, so can think, the relative error of fitting of straight line is big more, and its indicated the fastest direction of change in depth is inaccurate more.A kind of solution of conflict is: two straight lines are power with relative error of fitting separately, and the fastest direction straight line of ground change in depth L is found the solution in linear combination ₂, error of fitting is big more relatively, and the combination weights of line correspondence are more little, and concrete grammar is as follows:

If the fastest direction straight line of change in depth L ₂, linear perspective straight line L _P, texture gradient straight line L _TRelative error of fitting be respectively δ ₂, δ _P, δ _T, the angle of their slope correspondences is respectively θ ₂, θ _P, θ _T, the span of all θ is [pi/2, a pi/2], then has

θ ₂＝θ _P×δ _T/(δ _T+δ _P)+θ _T×δ _P/（δ _T+δ _P) (2)

δ ₂＝δ _P×δ _P/(δ _T+δ _P)+δ _T×δ _P/(δ _T+δ _P) (3)

So, straight line L ₂By its slope corresponding angles θ ₂With straight line L _PWith L _TIntersection point uniquely determine.Introduce straight line L below respectively _PWith L _TComputing method.

2.1 find the solution linear perspective straight line L _P

To the parallel lines that extend, in the plane of delineation, will lean on more and more closelyer at a distance in the objective world, even assemble.Such one group of line is called the convergence line, and their convergent point is called vanishing point.In image, the line indication is assembled to the surface of extending at a distance in the surface that the parallel lines indication is smooth.For outdoor images, the linear perspective effect generally only appears at the image above ground portion.But, act on above ground portion and sky part simultaneously for off-the-air picture.The depth perception rule of linear perspective is: the object in the image is near more from vanishing point, and perceived depth is big more, otherwise more little.Simultaneously, the center line of assembling line can point out that also the image perceived depth changes the fastest direction.

To every width of cloth image, use the Hough converter technique to find out the corresponding respectively image point set of 10 the longest straight lines earlier, with least square method these point sets are fitted to straight line respectively then, and obtain equation, slope corresponding angles θ and the relative error of fitting δ of every straight line.Utilizing the thought of similar formula (2), formula (3), serves as power linear combination by these 10 straight lines with separately relative error of fitting, and linear perspective straight line L is easy to get _PSlope corresponding angles θ _P, relative error of fitting δ _PAnd straight-line equation.

2.2 find the solution texture gradient straight line L _T

By the content of psychology of vision as can be known: surperficial far away more from the observer, it is more little that texture becomes.Its reason is: near more from viewpoint, the homogeneity object that retina of the same area (imaging plane) zone comprises is few more, and promptly image resolution ratio is big more, and the size of texel is big more.In the interior of articles zone, the difference of pixel intensity is little, so object generally is perceived as homogeneous region.This also just means: say that from the statistical significance near more from viewpoint, the pixel intensity difference sum in the identical image zone should be more little.For this reason, we are with the luminance difference degree of each pixel texture gradient as it, and further find the solution texture gradient straight line L with it _T, concrete computation process is as follows:

(1) establish I (m is the brightness I=(R+G+B)/3 at arbitrary pixel place, image ground n), be calculated as follows this some place luminance difference degree Idiff (m, n).Z ₁Determine the computer capacity of each pixel intensity difference, certain value of getting in 1,2,3 is advisable.

Idiff (m, n) = (Σ_{i = - Z_{1}}^{Z_{1}} Σ_{j = - Z_{1}}^{Z_{1}} | I (m, n) - I (m + i, n + j) |) / {(2 Z_{1} + 1)}^{2} - - - (4)

(2) the image above ground portion is divided into Z equably ₂* Z ₂Fritter, the piece number of establishing horizontal direction (OK) and vertical direction (row) is respectively S, T.Every luminance difference degree Mdiff is the pixel intensity diversity factor Idiff sum of being had a few in the piece, finds out the piece that has minimum brightness diversity factor Mdiff in every row (horizontal direction), and note is made R respectively ₁, R ₂..., R _T-1, R _TSay piece R from the statistical significance ₁, R ₂..., R _T-1, R _TRepresent in each row from the nearest zone of viewpoint.Z ₂Value unsuitable excessive, get about 5 and be advisable.

(3) with least square method to piece R ₁, R ₂..., R _T-1, R _TCenter point coordinate carry out match, just can calculate texture gradient straight line L _TSlope corresponding angles θ _T, relative error of fitting δ _TAnd straight-line equation.

3. computed image ground perceived depth figure

As shown in Figure 2, the perceived depth minimum of near point U is made as D _U, its value equal camera from the distance of the nearest imaging point of objective world scene divided by the camera imaging coefficient B.Changing Pattern at image above ground portion each point perceived depth is: near point U to the image medium line, along the fastest direction straight line of change in depth L ₂, the image depth values linear increment reaches maximum until vanishing point V; With the fastest direction straight line of change in depth L ₂On the perpendicular straight line have the identical degree of depth (sea-bottom contour) a little.If (m is m for the arbitrary coordinate of image above ground portion n) to P, the pixel of n, solution point P (m, the relative perceived depth D that n) locates _PMethod as follows:

If the fastest direction straight line of change in depth L ₂Slope be K ₂, straight line L ₃(m is n) and perpendicular to straight line L to cross some P ₂So, straight line L ₃Slope K ₃=-1/K ₂, straight line L then ₃Equation be:

X+K ₂Y-mK ₂-n＝0 (5)

If near point U is to straight line L ₃Distance be D _U-L3, then have:

D _U-L3＝|U _x+K ₂U _y-mK ₂-n|/(1+K ₂ ²) ^1/2 (6)

So some P (m, the perceived depth D that n) locates _PFor:

D _P＝D _U+D _U-L3 (7)

Generalized case, the perceived depth D of near point _UBe difficult to estimate, consider it and D _U-L3Compare much smallerly,, be set as 0 so in the experiment of back, do not consider.

4. the perception size of computed image object

Utilize formula (1) to calculate the perception size of each image object.Because we only calculate relative perception size, so the middle B value of formula (1) can be made as 1.

S＝B×A×D＝A×D (8)

Other variations of the present invention and modification it will be apparent to those skilled in the art that the present invention is not limited to described embodiment.Therefore, with the true spirit of the disclosed content of the present invention and any/all modifications, variation or the equivalent transformation in the cardinal rule scope, all belong to claim protection domain of the present invention.

Claims

1. An image object size constancy calculation method is characterized in that: it comprises the following steps:

(1) Calculate the middle line of the image with sky detection technology;

(2) In the ground part of the image, calculate the straight line in the direction of the fastest depth change from the edge line at the bottom of the image to the middle line, and obtain its slope;

(3) Calculate the relative perceived depth at the midpoint of each image object;

(4) Calculate the visually perceived size of each image object as a result of the calculation of size constancy.

2. A method for calculating size constancy of an image object according to claim 1, characterized in that: in step (1), the color consistency of the image sky part (including the ceiling) is better, and the layout is simpler, and this feature is utilized , the sky can be separated using image segmentation techniques.

3. a kind of image object size constancy calculation method according to claim 1 is characterized in that: in step (2), two kinds of depth clues of linear perspective and texture gradient are used to calculate the fastest direction straight line of ground depth change, and Two methods for fusion of depth cues are proposed.

4. according to claim 1, 3 described a kind of image object size constancy calculation method, it is characterized in that: in step (2), when calculating the fastest direction straight line of ground depth change with linear perspective depth clue, first use Hough The transformation technology finds the image point sets corresponding to the longest 10 straight lines, and then uses the least square method to fit these point sets into straight lines, and obtains the equation of each straight line, the corresponding angle θ of the slope and the relative fitting error δ , and finally, the linear combination of these 10 straight lines with their respective relative fitting errors as weights can be used to obtain the corresponding angle θ _P of the slope of the linear perspective straight line L _P , the relative fitting error δ _P and the equation of the line.

5. a kind of image object size constancy calculation method as claimed in claim 1, 3, 4 is characterized in that: in step (2), proposed a kind of utilizing texture gradient clue to calculate ground depth change fastest direction straight line ( The method of straight line L ₂ ), the main steps are as follows:

(1) Let I(m, n) be the brightness I=(R+G+B)/3 at any pixel point on the image ground, and calculate the brightness difference Idiff(m, n) at this point according to the following formula.

Idiff Idiff ((m m,, n no)) = = (({Σ Σ}_{i i = = - - {Z Z}_{11}}^{{Z Z}_{11}} {Σ Σ}_{j j = = - - {Z Z}_{11}}^{{Z Z}_{11}} | | I I ((m m,, n no)) - - I I ((m m + + i i,, n no + + i i)))) / / {((22 {Z Z}_{11} + + 11))}^{22}

Z1 determines the calculation range of the brightness difference of each pixel. Experiments show that it is appropriate to take a value among 1, 2, and 3.

(2) The ground part of the image is evenly divided into Z2*Z2 small blocks, and the number of blocks in the horizontal direction (row) and vertical direction (column) are respectively S and T. The luminance difference Mdiff of each block is the sum of the pixel luminance difference Idiff of all points in the block, find the block with the minimum luminance difference Mdiff in each row (horizontal direction), and denote it as R1, R2,..., RT-1, RT. In a statistical sense, blocks R1, R2, ..., RT-1, RT represent the regions closest to the viewpoint in each row.

(3) Fit the central point coordinates of blocks R1, R2, ..., RT-1, RT by the least square method, and then calculate the corresponding angle θT of the slope of the texture gradient line LT, the relative fitting error δT and straight line equation.

6 . The method for calculating the size constancy of an image object according to claim 1 , wherein in step (3), a method for calculating the relative perceptual depth at the midpoint of the image object is proposed. 7 . The intersection point U(Ux, Uy) of the straight line L2 in the direction of the fastest change in ground depth and the bottom edge of the image ground is the point in the image with the smallest perceived depth, which is called the near point. The perceived depth of the near point U is set to DU, and its value is equal to the distance between the camera and the nearest imaging point of the objective world scene divided by the camera imaging coefficient B. Let P(m, n) be any pixel point whose coordinates are m and n on the ground part of the image, set the slope of the straight line L2 in the direction of the fastest depth change as K2, and the straight line L3 passes through the point P(m, n) and is perpendicular to the straight line L2 , so the slope K3 of the straight line L3=-1/K2, then the equation of the straight line L3 is: X+K2Y-mK2-n=0. Assuming that the distance from the near point U to the straight line L3 is DU-L3, then: DU-L3=|Ux+K2Uy-mK2-n|/(1+K22)1/2. Then the calculation formula of the perception depth DP at the point P(m, n) is:

DP=DU+DU-L3

7. The method for calculating size constancy of an image object according to claim 1, characterized in that: in step (4), a calculation formula for the perceived size of an image object is proposed: S=B×A×D. S is the perceived size of the object, A is the imaging angle of the object, D is the perceived depth of the object (also known as the perceived distance), that is, the distance from the camera when the object on the image perceived by the human visual system is imaged, and B is the distance from the camera to the eye ( camera) related imaging coefficients (for the same imaging, the B value is the same for all objects).