GB2520261A

GB2520261A - Visual servoing

Info

Publication number: GB2520261A
Application number: GB1319973.2A
Authority: GB
Inventors: Benoit Vandame
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2013-11-12
Filing date: 2013-11-12
Publication date: 2015-05-20
Anticipated expiration: 2033-11-12
Also published as: GB201319973D0; GB2520261B

Abstract

An apparatus comprising a moveable actuator and an imaging device coupled to the actuator, in which the actuator is controlled by visual servoing, based on images from the imaging device and a reference image. The actuator is controlled to minimise differences between a current image from the imaging device and the reference image. The present invention employs an imaging device including a micro-lens array, suitable for use in a light-field camera for example, comprising one or more subsets where the centres of the micro-lenses are slightly displaced versus a regular lattice so that each micro lens in each subset has a different offset. The small displacements are preferably defined by characteristic parameters of the light-field camera and in certain embodiments can advantageously be arranged in order to provide an optimal sampling distribution of the projected image. Embodiments of the present invention may be employed in so called eye-in-hand cameras which are mounted on or embedded in the actuator of a robotic apparatus.

Description

Visual Servoing The present invention relates methods and apparatus concerning visual servoing.

Visual Servoing methods aim at guiding a robot from a current position to a reference position using a camera. One application of visual servoing is the so-called "eye-in-hand" robot or apparatus whereby a camera is mounted on the actuator of the apparatus as for instance a robotic arm. Among visual servoing methods, the so-called photometric visual servoing (PVS) algorithm has been introduced by the LAGADIC/INRIA located at Rennes France, see "Visual servoing set free from image processing" Christophe Collewet, Eric Marchand, Francois Chaumette, ICRA, 2008. This algorithm defined a control law which computes the speed of the robot depending on the subtraction of the current image versus the reference image. Iteratively the control law updates the speed of the robot up to result in the current image being equal to the reference image.

PVS algorithm offers very precise guiding: the current position converges to the reference position with great accuracy.

A so-called 4D light-field camera typically refers to a multi-array camera made of a lens-array and a single sensor. This type of camera can be used for robot guiding based on PVS. The advantage of using a lens array camera, having for example NxN lenses or micro-lenses, over a single lens camera is that for the same field of view and sensor, the camera thickness is typically shorter by a factor N, and also the depth-of-field is enlarged by a factor N. Using a multi-array camera for robot guiding based on a PVS algorithm could be advantageous since the multi-camera array camera can be made shorter in depth than a single lens camera. Unfortunately the accuracy of the guiding is typically decreased by a factor N with an N x N multi-array camera, as compared to a single-lens camera.

Light-Field cameras record 4D (four dimensional) light-field data which can be transformed into various reconstructed images like re-focused images with freely selected focal distance that is the depth of the image plane which is in focus. A re-focused image is built by projecting the various 4D light-field pixels into a 2D (two dimensional) image. Unfortunately the resolution of a re-focused image varies with the focal distance.

First let us consider light-field cameras which record a 4D light-field on a single sensor (a 2 dimensional regular array of pixels). The 4D light-field is recorded by multi-camera array: an array of lenses and the sensor. Figure 1 illustrates a light-field camera with two elements: the lens array and the sensor. Optionally some spacers might be located between the micro-lens array around each lens and the sensor to prevent light from one lens to overlap with the light of other lenses at the sensor side. The lenses of the lens array can be designed with small radius and hence the lenses which make up an array may be referred to as micro-lenses.

4D Light-Field data

The sensor of a light-field camera records an image which is made of a collection of 2D small images arranged within a 2D image. Each small image is produced by the lens (i,j) from the array of lenses.

Figure 3 illustrates the image which is recorded at the sensor. The sensor of a light-field camera records an image of the scene which is made of a collection of 2D micro-images, also called small images, arranged within a 2D image. Each small image is produced by a lens from the array of lenses. Each small image is represented by a circle, the shape of that small image being function of the shape of the micro-lens. A pixel of the sensor is located by its coordinates (x,y). p's the distance in pixels between two centres of contiguous micro lens images. The micro-lenses are chosen such as p is larger than a pixel width. A micro-lens image is referenced by its coordinates (i,j). Some pixels might not receive any light from any micro-lens; those pixels are discarded. Indeed, the space between the micro-lenses can be masked to prevent photons falling outside of a lens (if the micro-lenses are square or another close packed shape, no masking is needed). However most of the pixels receive the light from one micro-lens. The pixels are associated with four coordinates (x,y) and (i,j). The centre of the micro-lens image (i,j) on the sensor is labelled (x,,,y11). Figure 3 illustrates the first micro-lens image (0,0) centred on (x90,y09). The pixel rectangular lattice and the micro-lens rectangular lattice are not rotated. The coordinate (x1,y,1) can be written in function of the 3 parameters: p and (x00,y00): = p.i+xoo (1) y = p / + Yoo Figure 3 also illustrates how an object, represented by the black squares 3, in the scene is simultaneously visible on numerous micro-lens images. The distance w between two consecutive imaging points of the same object 3 on the sensor is known as the disparity. The disparity depends on the physical distance between the camera and the object. w converges to p as the object distance z tends to infinity from the camera.

Geometrical property of the light-field camera

The previous section introduces v the disparity of a given observed object, and p the distance between 2 consecutive micro-lens images. Both distances are defined the disparity of a given observed object, and in pixel units. They are converted into physical distances (meters) W and P by multiplying respectively w and p by the pixels size 3 of the sensor: IT7 = 3w and P = 3p.

The distances W and P can be computed knowing the characteristics of the light-field camera. Figure 2 gives a schematic view of the light-field camera having the following features.

* The lens array is made of N by N lenses having a focal distance f. The pitch of the micro-lenses is P. The micro-lenses might have any shape like circular or square. The diameter of the shape is lower or equal to.

One can consider the particular case where the lenses are pinholes. With pinholes the following equations remain valid posing f = d.

* The sensor is made of a squared lattice of pixels having a physical size of o. 6 is in unit of meter per pixel. The sensor is located at the fix distance d from the micro-lens array.

* The object is located at the distance z from the lens-array. The distance between the lens array and the sensor is d. The disparity of the observed object between two consecutive lens is equal to iv. The distance between is 2 lens image centres is P. From the Thales law we can derive that: zz+d (2) P iv Or: (3) This equation gives the relation between the physical object located at distance z from the lens array and the disparity iv of the corresponding views of that object. This relation is built using geometrical considerations and does not assume that the object is in focus at the sensor side. The focal length f of the micro-lenses and other properties such as the lens apertures allow determining if the micro-lens images observed on the sensor are in focus. In practice, one tunes the distance d once for all using the relation: 11 1 (4) z d I The micro-lens images observed on the sensor of an object located at distance z from the micro-lens array appears in focus as far the circle of confusion is smaller than the pixel size. In practice the range [z,z] of distances z which allows observing a focused micro-images is large and can be optimized depending on the focal length f and the distance ci: for instance one could tune the micro-lens camera to have a range of z from 1 meter to infinite [i,co]. Embodiments of the presently proposed invention however do not adopt this approach.

Variation of the disparity The light-field camera being designed the values ci, f are tuned and fixed. The disparity W varies with z the object distance. One notes special values of W: * FV is the disparity for an object at distance z1 such that the micro-lens images are exactly in focus, it corresponds to equation (4) . Mixing equations (3) and (4) one obtains: = (5) * W, is the disparity for an object located at distance az from the lens array. According to equation (3) one obtains: = -9 (6) The variation of disparity is an important property of the light-field camera. The ratio is a good indicator of the variation of disparity. Indeed the micro-lens images of objects located at zr are sharp and the light field camera is designed to observed objects around z10 which are also in focus. The ratio is computed with equations Error! Reference source not found, and Error! Reference source not found.: y a-If1 (1) a d The ratio is close to 1.0 for a around 1.0. In practice the variations of disparity are within few percent around The present inventor has further brought to light the following aspects.

Image refocusing ID A major interest of the light-field cameras is the ability to compute 2D images where the focal distance is freely adjustable. To compute a 2D image out of the 4D light4ield, the small images observed on the sensor are zoomed, shifted and summed. A given pixel (x,y) of the sensor associated with the micro-lens (/,!) is projected into a 2D image according to the following equation: JX = s'(g(x-x13)+x11) (8) = s

IS

Where (X,Y) is the coordinate of the projected pixel on the 2D refocused image.

The coordinate (X,Y) is not necessarily integer. The pixel value at location (x,y) is projected into the 2D refocused image using common image interpolation technique. Parameter s controls the size of the 2D refocused image, and g controls the plane which is in focus (the plane perpendicular to the optical axis, for which the 2D image is in focus) as well as the zoom performed on the small images. The output image is s times the sensor image size. In this formulation the size of the re-focused image is independent from the parameter g, and the small images are zoomed by sg.

The previous equation can be reformulated due to the regularity of position of the centres of the micro-lens images.

= sgx * sp(I -g)i + sO -/9 = sgy+sp(1-g)j+s(1-g)y00 The parameter g can be expressed as function of p and w. It is computed by simple geometry. It corresponds to the zoom that must be performed on the micro-lens images, using their centres as reference, such that the various zoomed views of a same objects get superposed. One deduces the following relation: p (10) p -w This relation is used to select the distance z of the objects in focus in the projected image.

Including this last relation into equation (9) one rewrites the projection equation: X = sgx -svi + (11) Y = sgy -sj + 5614 p The last formulation has the great advantage to simplify the computation of the projected coordinate by splitting the pixel coordinates (x,y) and the lens

coordinates (i,j) of the 4D light-field.

Sampling property of the refocused image The different pixels of the light-field image are projected into the re-focused image according to the above described method and define a set of projected coordinates (X,Y) into the grid of the refocused image. It has been recognized by the present inventors that the distribution of the set of projected coordinates is an important property which can be used to characterise the resolution of the refocused image, and in particular, the regularity or homogeneity of the distribution. As will be explained later the present invention addresses this to homogeneity.

It is not trivial to characterise the homogeneity of the projected 4D light-field pixels into the 2D re-focused image. To study this property one considers that the coordinate of the first micro-lens centre (x00,y00) is equal to (0,0). One obtains the following simplified projection equation with ii = sg: x = ux-uwi= P (x-w;) (12) V = = p -1'V This set of equation shows a simple relation between the 4 dimensions x,y,i,j and the projected coordinates (X,Y). The value ii = sg is a constant independent of w if s =k/g where k is any constant. In this condition, the size of the re-focused image is function of w and is equal to k/g times the size of the original image.

Figure 4 illustrates the 1D projected coordinate x for a particular settings: s=0.5, g=7.677, w=151.05, u=3.83 and p=l73.67. The x-axis shows the projected coordinates x, the y-axis indicates the micro-lens coordinates i of the projected pixels. One notices that 8 micro-lenses contribute to the observed projected coordinatesx, which in this case is equal to r+1. The distribution of the projected points X is not homogeneous since the values h and H representing respectively the minimum and the maximum sampling steps between 2 consecutive projected coordinates Xare substantially different from each other. In this example, the projected coordinates are nearly superposed, clustered in groups of eight.

Figure 5 illustrates the same view with the same settings except that TV = 151.25.

to The distribution of the projected points X is homogeneous, the projected points being distributed with equal spacing along the axis x. {w} plays a major role in the maximum sampling steps between the projected coordinates. {.} denotes the fractional part of a number.

Several cases of h and H occur depending on {w} 1. With {w}= 0: h = 0 and H = u. N projected coordinates X overlap. The distance between 2 non-overlapped consecutive x is constant and equal to H. The projected coordinates define a perfect sampling with a constant sampling step equal to H =u.

2. With {w}=n/7V where n and N are positive integers such as 0< n <N «= r. In this case the number of overlapped projected coordinates X is equal, on average, to gcd(n, N) where gcd(n, N) refers to the greatest common divisor between ii and N. The projected coordinates define a perfect sampling with a constant sampling step equal to H =ugcd(n,N)/N. The sampling step is smaller if AT is a prime number.

Indeed, if N is not a prime number, the number of overlapped coordinates increase as well as the sampling step.

H is a good indicator to estimate the resolution of the re-focused image. Figure 6 illustrates the normalized sampling step H/u as a function of {w} for a conventional light field camera characterized by p =173.67 and wf 150. This function is built from the 2 cases described above: points surrounded by the black circles depict the first case (12 = 0 and {w} = 0); points surrounded by the empty circles depict the second case (all the possible regular grids) ; other points lying on the dark line segments correspond to the third case with all possible ii values and any N = 7 (all possible irregular grids). The best possible resolution is givenby h=H/u=1/iV=1/7.

Problem of common light-field cameras

The projection of the 4D light-field pixels defines a set of projected points having a distribution which depends on the selected focal distance (i.e. The object plane which it is desired to be in focus). As explained above, the resolution of the re-focused image highly depends on the distribution of the projected coordinates(X,Y). The resolution can be estimated by the maximum sampling stepH. Unfortunately, H depends on {w} and varies from values of u to u/N.

Variations of H make the resolution of the re-focused image vary. The present inventors have recognized that the distribution characterizes the resolution of the projected image.

Experimentation has shown that actuator guidance based on PVS servoing, using a camera array of N x N micro-lenses converges to the reference position, but the average guiding accuracy is a factor N times less accurate than with a single lens camera. Here guiding accuracy refers to the average petitioning error after the control has been iterated many times and the current position has converged to the reference position. Actually the guiding accuracy depends on the average distance of the objects being observed by the camera. For some object distances, the micro-lens images are overlapped and the guiding accuracy is poor. For other object distances the micro-lens images are ideally interleaved and the guiding accuracy is as good as with a single lens having the same sensor According to a first aspect of the invention there is provided an apparatus comprising a moveable actuator, an imaging device coupled to said actuator, and a controller adapted to receive images from said imaging device, and control movement of said actuator, wherein said controller is configured to compare a reference image with a current image captured by the imaging device for a to current position of the actuator, and to control said actuator to minimise differences between the current image and the reference image; and wherein said imaging device comprises a plurality of micro-lenses arranged in a micro-lens array, relatively to a regular square lattice, and a photo-sensor having an array of pixels, each micro-lens of the array projecting an image of a scene on an associated region of the photo-sensor forming a micro-image; wherein the micro-lens array comprises one or more micro-lens subsets, each sub-set comprising a two dimensional array of micro lenses being displaced relative to the regular lattice according to a common pattern, the common pattern defining different displacements for each micro-lens of a sub-set.

By using a shifted micro-lens array in this way, the guiding accuracy will be almost identical to a single lens camera whatever are the object distances to the camera are. The shifted micro-lens array is designed for a typical distance for which the shifts are computed.

Embodiments of the present invention therefore employ a micro-lens array, suitable for use in a light-field camera for example, where the centres of the micro-lenses are slightly displaced versus a regular lattice. The small displacements are preferably defined by characteristic parameters of the light-field camera and in certain embodiments can advantageously be arranged in order to provide an optimal sampling distribution of the projected image.

This affords the advantage that a sensor or camera having a shorter depth dimension can be provided. In addition, extended depth of field compared to a single lens camera can be provided. Furthermore, by employing a shifted microlens array, accuracy is restored to substantially that of a conventional single lens camera arrangement.

In embodiments, the image sensor is coupled to the actuator in such a way that movements to the actuator are replicated by the image sensor, ie in a rigid manner.

In embodiments the controller is adapted to control the velocity of the actuator, IS based on the difference between the reference image and the current image.

Preferably the velocity of the actuator is iteratively updated by the controller for each new image acquired by said imaging device, up until the current image and the reference image are substantially the same, or conversely the velocity is substantially zero. The velocity may calculated as a function of the difference between the reference image and the current image, the function being determined by an interaction matrix, in certain embodiments As will be explained below, in embodiments of the invention the interaction matrix depends on the distance of on object in a captured image from the sensor, and in such embodiments an estimate of said distance can be derived from said imaging device having a microlens array. For example, because a light field camera having such a microlens array is able to capture a scene from slightly different angles of view, an estimate of object depth can be made.

In embodiments the or each sub-set comprises a square array of UxAT micro-lenses. Embodiments may comprise a single sub-set of microlenses. For example the imaging device may include a lens array having a total of 4 lenses, in a 2x2 array pattern, or a total of nine lenses, in a 3x3 array pattern, or a total of 16 lenses, in a 4x4 array pattern. In each case, the lenses are shifted slightly, from the regular square grid positions (although typically one lens of the subset, or array may not be shifted), by the pattern of displacements.

In embodiments including a square array of NxN micro-lenses, the interaction to matrix, which determines the relationship between the velocity of the actuator and the difference between the reference image and the current image, is a combination of NxN decentred matrices.

Embodiments of the present invention may be employed in so called eye-in-hand cameras which are mounted on or embedded in the actuator of a robotic apparatus. Such cameras are expected to be as small and lightweight as possible. The thickness of a camera is determined by the focal-length of the lens.

To make a camera smaller one option is to use a light-field camera made of a micro-lens array mounted on a single sensor. The distance between the micro-lens array to the sensor is equal to the focal-length of the micro-lenses. That distance is typically Ntimes smaller than a single-lens camera, where N N being the number of micro-lenses covering the sensor as illustrated in Figure 12.

Each micro-lens forms a small micro-image. The field-of-view of each micro-lens is almost identical to the field-of-view of the single-lens camera assuming same sensor size and AT times shorter focal-length. Another advantage of this design is that the depth-of-field of the micro-images is N times larger than the depth-of-field of the single-lens camera. This is a great advantage since the depth-of-field is often quite limited when a single-lens is used with a large aperture to collect more light with short exposure time.

The common pattern defines a displacement model whereby each micro-lens of the or each sub-set is located according to the common pattern but differently located with respect to the regular lattice. The pattern, in certain embodiments, defines a number of possible displacements and hence positions, for each micro-lens. In embodiments having multiple sub sets, although it is simpler for the actual displacements of micro-lenses in different sub sets to be the same, embodiments of the invention allow different subsets of the plurality to have different displacements, while still adhering to the same, common pattern or model of displacements. However, the pattern or model is such that even with a certain degree of flexibility provided for each lens displacement, the relative displacements between micro-lenses of a subset adhere to a controlled relationship, and such relationship is observed similarly across subsets of the plurality.

Thanks to these characteristics, the resolution of an image reconstructed from the micro-images is improved over conventional light-field cameras. In particular a good diversity of sampling is obtained while variations of resolution are avoided. Thus a more regular resolution is obtained for any focalization distance.

Typically, all subsets of the micro-lens array share a common pattern. However, the plurality of subsets need not encompass the whole micro-lens array. It could be envisaged for example that a first common pattern could apply to a first plurality of subsets, and a second common pattern could apply to a second plurality.

Advantageously, the common pattern defines each displacement as a function of the position (i,j) of each micro-lens within the or each sub-set. The common pattern may further define each displacement as a function of the number of micro-lenses in each sub-set. The obtained dispositions of the micro-lenses provide for an advantageous distribution for the projected image and reduce the super-position or the clustering of pixels in a reconstructed image, which permits to obtain a more constant resolution for any focalization distance.

According to one embodiment, the common pattern defines displacements in integer multiples of unit displacement vectors. The vectors are preferably orthogonal vectors. These features allow reducing the super-position or the clustering of pixels while reconstructing an image from the micro-images. The magnitude (r) of the unit displacement vectors is advantageously a function of the focal distance of micro-lenses.

In embodiments having an array of NxN microlenses, the pattern defines a different displacement, considered modulo N, for each microlens. Thus for N=2 for example, there are 4 different displacements (modulo N), and for N=3, there are nine different displacements. Each displacement may be considered, for example by considering an NxN grid of points, having a spacing r, with one corner selected as a coordinate origin. Each vertex of the grid then defines a displacement. If we take the top left as the coordinate origin and define positive I and j directions as right and down as viewed, as per Figure 7, then in the example of a 2x2 array, displacements of (0,0), (1,0), (0,1) and (1,1) are defined (modulo 2). It is noted however that the position i,j of the microlens in the array, does not correspond to the displacement as considered above. In Figure 7 for example, the displacement of the lens of the array located at(0,1) is (1,1).

Thus, at each lens position a displacement vector is assigned, and owing to Modulo N arithmetic, the possible positions for that vector are repeated in an NxN grid arrangement. Thus for N=2, a displacement of (1,1) is equivalent to (1,-i), (- 1,-i), (-1, 1) etc. This geometry is illustrated for N=2 in Figure 16. The undisplaced centre point for each given lens (ie the vertices of the regular lattice of the array) can be considered as the intersection of the dashed lines in Figure 16. Depending on which (of four) lenses of the array we are considering, a single type of symbol illustrates the possible positions of that lens. At lens position i,j =0,0 as illustrated in Figure 7, the displacement is zero, and therefore the locus of possible positions correspond to the circles. For lens position 0,1 the possible lens position is indicated by a triangle.

Thus a repeating pattern is defined, for each lens, defining a set of possible positions, according to the position of that lens in the array. An equivalent repeating pattern could be constructed for the case of N=3 for example, having nine different symbols, each corresponding to one of nine possible displacements repeated modulo N in two dimensions. The same principle extends for any value of N. In another embodiment, the magnitude of the unit displacement vectors is a function of the number of micro-lenses in each sub-set. The multiple of the unit vectors for each micro-lens may further be a function of the position of the micro-lens within the sub-set. The resulting dispositions of the micro-lenses provide for an advantageous distribution for the projected image and reduce the super-position or the clustering of pixels in a reconstructed image.

In a particular embodiment, the sub-sets comprise a square array of NxN micro-lenses. Furthermore, the common pattern defines a plurality of possible displacements for each micro-lens, each of said plurality being equivalent in modulo N. In a further embodiment, the displacement of at least one micro-lens in each sub-set is zero. This allows providing a common reference in each micro-lens sub-set so that the computation of a reconstructed image is simplified.

Furthermore, it allows the relative position between the micro-lenses to be precisely obtained. This feature also simplifies the fabrication of the micro-lens array.

Advantageously, the common pattern and the associated displacements are independent of the location of the sub-set within the micro-lens array. I0

These features allow reducing the super-position or the clustering of pixels when reconstructing an image from the micro-images. Advantageously, the magnitude has a fixed value representing characteristics of the imaging device.

In embodiments of the invention, the displacement (if any) of each micro-lens is of the order of 1/1000 of the micro-lens pitch. Embodiments may therefore have displacements in the range of approximately 0-5pm, or approximately.0-lOpm

for example.

The invention also provides a computer program and a computer program product for carrying out any of the methods described herein and/or for embodying any of the apparatus features described herein, and a computer readable medium having stored thereon a program for carrying out any of the methods described herein and/or for embodying any of the apparatus features described herein.

The invention extends to methods, apparatus and/or use substantially as herein described with reference to the accompanying drawings. Any feature in one aspect of the invention may be applied to other aspects of the invention, in any appropriate combination. In particular, features of method aspects may be applied to apparatus aspects, and vice versa. Furthermore, features implemented in hardware may generally be implemented in software, and vice versa. Any reference to software and hardware features herein should be construed accordingly.

Embodiments of the invention will now be described, by way of example only, and with reference to the following drawings in which: Figure 1 -Schematic view of a multi-array camera made of a lens array and a single sensor.

Figure 2 -Detailed view of a light-field camera made of perfect lenses.

Figure 3 -Schematic view of the 4D Light-field data recorded by the 2D image

sensor of a light-field camera.

Figure 4-Illustration of the coordinates of the projected 4D Light-Field pixels into the 20 projected image: normal case.

FigureS-Illustration of the coordinates of the projected 4D Light-Field pixels into the 20 projected image: special case for associated with a special disparity.

Figure 6 -Normalized maximum sampling step of the projected/re-focused

image from 4D light-field data.

Figure 7 -Schematic view of the proposed micro-lens array with displaced micro-lenses for a super-resolution factor N=2.

Figure 8 -Schematic view of the proposed micro-lens array with displaced micro-lenses for a super-resolution factor N3 Figure 9 -Normalized maximum sampling step of the projected/re-focused image from 40 light4ield data obtained with the proposed micro-lens array with displaced micro-lenses for a super-resolution factor of N=2 Figure 10 -Normalized maximum sampling step of the projected/re-focused image from 40 light4ield data obtained with the proposed micro-lens array with displaced micro-lenses for a super-resolution factor of N=3 Figure 11 -Projection of a 3D point into the sensor of a single lens camera Figure 12 -Camera thickness between single-lens camera and lens-array camera Figure 13-Projection of a 3D point into the sensor of a single lens camera Figure 14 -Pixel coordinate of the micro-images on the sensor.

Figure 15-articulated actuator including an image sensor.

Micro-lens array with displaced micro-lenses The pixels of the 4D light-field image are projected into a re-focused image. As described above, the maximum sampling step of the projected coordinates depends on (w) the fractional part of the disparity. The variations of sampling step are due to the superposition or clustering of the projected coordinates for certain values of {w} as illustiated in figure 4.

To decrease the superposition or the clustering of the projected coordinates (X,Y), the micro-lens images are shifted as compared to a regular array, so as to reduce or prevent overlapping or clustering of projected pixels. In other words, in embodiments of the invention the centre of a given micro-lens (i,/) is shifted by the given shift (A1(i,j),A1(i,j)) so that the modified projected coordinates (X', Y') of this new light-field camera would become: jx' = ux-uwQ+A:(i,fl)=X-UWA1(i,i) 13 = uy-uw(+A,,j))=Y-uwA,(i,/) (A1(i,j),A1(i,f)) are shifts given in unit of distance between the micro-lens centres, or the micro-lens image centres. The motivation of moving the micro-lenses is to have a perfect and constant sampling of the projected coordinates (X',Y') for any w=[wj--n/iv where N defines the microlens array size, and ii is any integer such as ne[o,iv[.. Equation (1 3) becomes: = = iY(x-Lwji)-ni -A,(i, /)C [wj+ U (9) = = N(y-LwJj)-nf-A1(i,j)(NLWJ+n) (X",Y") are normalized projected coordinates such that (X,F) are integers for a perfect sampling of the projected coordinates (X,Y). For a perfect sampling A(l,J)cV[WJ+fl) and A1(i,f)(t4wJ+n) must also be integers respectively equal to k(/.j) and /(i,j). These constraints give us the following values for (A1(i,j),A1(i,JD: A' k(i,j) A -10,1) --r (10) V[wJ + n Lwi -i-n The displacement of the micio-lens images depends on w. In other words, for a given micro-lens displacement (A:(i,/),Aj(i,/)), the shift of the corresponding micro-lens image depends on the disparity w. The previous equations can be approximated by taking into consideration the two considerations: 1) w>> N; tO and 2) the variations of w are small. w can be considered constant and equal to Wiocts Indeed, it has been shown (cf. equation 7) that the ratio kVF/W[Q(.S, which is equal to w. /w1, is typically very close to I. In this condition, equation (15) can be approximated by: A:(i,j) 1 k(i,j) A *@1) 1 10,!) N N (11) A:(i,j) o k(i,j) A *(i,J) 6 IQ,j) Wç0ç N The second line of the previous equation is given knowing that = IS where 6 is the physical size of a pixel. The approximation of does not depend on the disparity w. Thus, by using this approximation, it is possible to build a micro-lens array with an irregular grid of micro-lenses such as the projected coordinates (XY) do not overlap or cluster as it happens for the projected coordinates (X,Y) of conventional light-field imaging devices.

A remaining question is how to define the 2 functions k(i,/) and 1(1,]) to have optimum micro-lens displacements such that the projected coordinate (X",Y") have a minimum clustering, and a perfect sampling when w = Lwi * n / N. Equation (14) can be simplified considering equation (15) and w>> N: fx = = Nx-Lw)-ni-k(i,j) U (17) N, . v = -v = --iØ,j) To obtain a perfect sampling the set of projected coordinates (X",Y") defined by the various lens coordinates (i,j) must have all possible integer values whatever n. This constraint can be reformulated by taking into consideration modular arithmetic modulo /V: 1v(x-LwJi)-ni-k(i,j) -ni-k(i,j) (modv) 18 N(y-LwJ/)-nf-1(i,j) -nj-I(i,j) (modN) k(i,/) and 1(i,f) are 2 periodic functions, one period is defined with (i,j) e [o, t[2, into [o, y[ One is searching k(i,j) and l(/,/) such that for any given n e [o,iv[, all the set of projected coordinates (X"modzv,Y"modiv) defined 1. b-V 1 h-N 1.

by (imodN,j mod K) e [o, v[ is equal to a01, where bob is the Dirac function located at (a,b) with (a,b) being integer numbers.

To solve equation (18) the following linear solutions are considered: J<k(i,/) Ai+B/+K (modN) Ci+E/+L (modN) Equation (18) becomes: R" -ni-Al-B/-K (modN) 20 1 -n/-Ci-E/-L (modiv) ( For the second member of the previous equation, one derives the value of I -C'Y"-C'nj -C'E/-CL (mod iv) (12) Where C is the inverse of C such that CC'1(modN). If gcd(C,iV)=1 then C' exist and is unique. Replacing (21) into (20) one obtains: x" _j(n2c+nc(A E)+ ACE -B)-c'(Y"+L)(n + A) -K (moaN) (13) The set of projected coordinates located at (x"modN,Y"modN) must cover all coordinates for (irnod N, jrnod N)e [o, NE2 whatever tie [o, N[.

With equation (22) one deduces that X"(modN) must have any possible values whatever Y"(mod N). Thus the second order polynomial function m(n) = n2c+nC(A + E)+ AC'E -B must verify: gcd(rn(n)modNi\T)= I Vn c[0,tsT[ gcd(n2C+nC(A + E)+ ACE -B(modiv),N)= 1 Vn e [0,iv[ The NxN micro-lenses defines a sub-set of micro-lenses. For a given sub-set the values A,B,C,E,K,L are freely selected according to the previous equation.

The parameters A,B,C,E,K,L may take different values in the different sub-sets.

Parameters K,L define which if any micro-lens from a given subset is not displaced with respect to the regular lattice.

Experimental solution Many values A,B,C,E verify equation (23). The special case: A=0, B=T, C = I, E = 1, K = 0 and L = 0 is detailed in this section. The proposed solution has the following form: Tk(i,J) T/ (modN) (15) (modN) T is a free parameter which has been experimentally determined for various values N. The experimentation consists in testing various values of T e [o,v[ such that the constraint gcd(rn(n)modN,N)=1 is respected for any n e[O,N[. The following table indicates the smallest value of T according to that constraint:

N TN TN TNT N T

1 0 6 1 11 3 16 1 21 1 2 1 7 1 12 3 17 1 22 3 3 1 8 1 13 1 18 1 23 1 4 1 9 1 14 1 19 3 24 1 3 10 3 15 4 20 3 25 3 It follows that the periodic functions ItO, I) and 1(14) are fully characterized and thus the shifts (A1(i,j),A1(i,/)) of the micro-lens image versus the regular grid are also fully characterized. The shifts are given in units of (14). To convert the shifts into physical unit at the micro-lens side, the shifts must be multiplied by F. The physical shifts (,(i,j),1(i,j)) at the micro-lens side are computed easily by combining equation (5) and (16): &@,n=L±.k@,j) A,@,j)=L.±.i(i,J) (16) The physical shifts can be decomposed in the increment r = which is multiplied by the integers values given by k(i,f) and IQ,j) to obtain the physical shifts. The design of the micro-lens array is therefore defined by: * The focal distance f of the micro-lenses.

* The average pitch P between consecutive lenses.

* The distance d between the micro-lens array and the sensor.

* The pixel size 6 of the sensor.

* The factor iv which defines the size of the micro-lens array made of N by N lenses * The micro-lens centres (p,,p7) are located following the equation: = iP+f±k@,j) d 17 = jP+--I(i,f) It should be recalled that the functions k(i,j) and /(i,j) are defined modulo iv: thus the centers (p1,p3) are valid as well as p1 +a/;i1 +at/) whatever a being an integer. Consequently the displacements can be negative.

The micro-lens array is designed according to the previous settings. If the size of the micro-lenses is equal to the pitch p, then the micro-lenses might have a very small overlaps due to displacement the micro-lenses versus the squared lattice.

This issue is solved by designing micro-lenses such that the micro-lens size is smaller than The shape of the micro-lenses can be circular, rectangular or any shape without any modification of the previous equations. The number of micro-lens (,j) to be designed in the micro-lens array is defined such that (IØe,Je) is equal to the physical size of the sensor. The micro-lens array being designed, it is located at distance a' from the sensor.

Micro-lens array design An imaging device including the above proposed arrangement as well as a micro-lens array will now be described. The following values are chosen for the different parameters: Symboir HH f 2mm Micro-lens focal distance d 2mm Distance heLween the micro-lens array and the sensor P 1mm Micro-lens pitch 0.004mm/pixel Physical size of pixel from the sensor z infinity Object is located at infinity from the lens array focus p 250.0 pixel Pitch in pixel unit o[the micro-lens images projecled on the sensor I.0 mm Disparity in physical nnit observed on the sensor of the object located at distance from the main lens.

w 250.0 pixel Disparity in pixel unit observed on the sensor olthe focus object located at distance z from the main lens.

N 8 Size of the micro-lens array. I0

In case of a 2 by 2 lens array, the increment = is equal to r = 2jinr. The values k(i,j) , 1(1,]) and for the first sub-set of MN micro-lenses are given in the following table: j k(i, 1) 10,1) o o 0 0 0 0 1 0 0 1 o i 1 1 r I I 0 Figure 7 illustrates the displacement of the micro-lenses versus the regular squared lattice. The bold arrows indicate the displacement of the micro-lens centres by r in the direction indicated by the arrow. It is worth noting that the arrows displayed in that figure have been artificially zoomed for illustration purpose.

In case of a 3 by 3 lens array, r = I.333rn. The values k(i,j) i(i,j) and for the first sub-set of AT micro-lenses are given in the following table: I k(i, j) 1(1,1) p, p1 o o 0 0 0 0 0 0 I 2 0 0 2 20 Zr o i 1 1 r 4+r 1 1 1 2 +2r 2 I I 0 2Ø+v o 2 2 2 2r 2+2r 1 2 2 0 2r 2 2 2 2 I 2Ø-i-2t 2+v Figure 8 illustrates the displacement of the micro-lenses versus the regular squared lattice. The same legend as figure 7 applies to figure 8.

Figure 7 illustrates a case where N = 2 In this case the increment r = is equal to = 2jini. Figure 7 shows the displacements of the micro-lenses versus a regular lattice. The regular lattice is defined by the equidistant dashed lines 0,1,2,3 extending in both directions / andj. The directions £ and jare preferably perpendicular. In a conventional micro-lens array the centres of the micro-lenses are located at the intersections of the lines defining the regular lattice to form a regular grid of equidistant micro-lenses. According to an embodiment of the present invention the micro-lenses are arranged in the following way on the array in accordance with formula 26.

The bold arrows indicate the displacement as a shift vector of the micro-lens centres. The amount of displacement is given by a multiple of a fixed increment r in accordance with formula 26 and the table at the end of this paragraph. The arrows displayed in the figure have been artificially zoomed for illustration purpose.

A plurality of micro-lenses are shifted with respect to the regular lattice: it follows that micro-lenses are set out of regular alignment in a particular way that reduces the superposition or the clustering of pixels in a reconstructed image. Preferably each micro-lens in each micro-lens sub-set is displaced by a different shift vector to increase the resolution of a reconstructed image. Each shift vector has a shift magnitude and a direction. Optionally at least one micro-lens in each subset is not displaced with respect to the regular lattice The micro-lens array can be made unitarily of glass or synthetic glass. Possible processes for forming the micro-lenses on a glass plate includes lithography and/or etching and/or melting techniques.

Figure 8 is similar to figure 7 but illustrates a case with iv = 3. In this case the increment is r =1.333jun. The displacement of the micro-lenses versus the regular lattice is similar to figure 8, but exhibiting a different pattern. In this case displacements have i andlor j components which may be 0, r or 2 r.

It is important to note that the relative positions of the micro-lenses in each N by N size array or sub-set (N being the number of micro-lenses in each directions i,j) is defined modulo N. Thus all displacements modulo N will also be solutions. This means that if a given displacement (;i,,p,) is a solution, then the displacement (p. -i-alV,p1 +bAT) with a and b integeis is also solution.

Resolution of the projected image The resolution of the projected image can be estimated by computing its maximum sampling step as for the conventional light-field camera made of a lens array arranged following a squared lattice (presented in figure 6).

Figure 9 illustrates the normalized JJ'/u values with of 2 by 2 lens array iv = 2 as a function of the fractional part of the disparity {w}. The corresponding characteristics of the light-field camera are the one given above. The dashed line recalls the normalized if/u values obtained with a regular square lattice micro-lens array. Identically, Figure 10 illustrates the normalized H'/u values with a 3 by 3 micro-lens array iv = 3.

One can observe in figures 9 and 10 that the resolution of the re-focused image varies less than a conventional (dash lines) with regular square lattice. Therefore, a more regular resolution is obtained with the proposed micro-lens array. The regularity of the resolution increases with AT.

The resolution of the projected image can be estimated by computing its maximum sampling step H as for the conventional light-field camera made of a lens array arranged following a square lattice as presented in figure 6.

Figure 9 illustrates the normalized [-P/u values with JV = 2 as a function of the fractional part of the disparity {w}. The corresponding characteristic parameters to of the light-field camera are the ones given above. The dashed line recalls the normalized H/u values obtained with a conventional light-field camera equipped with a regular square lattice micro-lens array. Similarly, Figure 10 illustrates the normalized H/u values with N=3.

One can observe in figures 9 and 10 that the resolution of the reconstructed image varies less than a conventional (dash lines) with regular square lattice.

Therefore, a more regular resolution is obtained with the proposed micro-lens array. The regularity of the resolution increases with N.

Introduction to visual servoing

Embodiments will be described relating to the so-called eye-in-hand robot: the camera is mounted on or into the robot arm and rigidly follows the effector (as illustrated in Figure 15). A visual servoing algorithm use images captured by the camera to control the motion of the robot. The robot is controlled by 3 translations t) and 3 rotations sc), the so-called 6 degrees of freedom. Visual servoing algorithms are control-laws which drive the robot from an initial position to a reference position. Special features are observed by the camera (image coordinates of interest points...). s contains the desired or reference value of the features. The error function e(t) is the error between the desired features extracted from the image at the reference position, and the features extracted from a current position. The aim of the control law is to move the robot to minimize the error e(t). The control law is operated in a closed-loop and causes the robot to converge to the reference position. The control law defines the relation between e(t) and the velocity of the robot v: v-ALe mLLt(asfl (18) to With = (t >). where is the linear camera velocity, and -is its angular velocity.

is the interaction matrix which describes the relationship between the time variation of s and the camera velocity v (U, is a Jacobian matrix). L is the approximation of the pseudo-inverse of *L,.

The various control algorithms which can be used to guide the robot from an initial position to a reference position are grouped in 2 different approaches: Image Based Visual Servoing (IBVS) and Position Based Visual Servoing (PBVS). In the second case, the 2D extracted features s are reverse-projected to estimate the 3D position of the object being tracked. This latter case is actually 3D servoing.

IBVS is performed only on 2D features extracted from the image. At a reference position the features s are known and serve as a reference for the 2D features extracted from a current position. Figure 11 illustrates a pinhole model of the camera: a point P*X,. Y,.Z) in the 3D space is projected in the camera image plane p(x. v1-f) through the pinhole located at (DOSO). Coordinate system(X...YZ) is relative to the pinhole. The distance f from the camera image plane to the pinhole corresponds to the focal length of a perfect thin lens located at the pinhole. Using the perspective equation one deduces that (i;. .v) = CfX7ZfY/Z).

By derivation of the projection equation and using kinematic relation, one deduces the interaction matrix L of a given projected point:

H -

-- --[---(19) The interaction matrix is a Jacobian matrix which establishes the relation between the derivative of the 2D features position (2 rows) versus the derivative of the robot position (6 columns corresponding to the robot motion). (*y) are real coordinates (not pixel coordinates), which are equal to zero on the main optical axis.

The interaction matrix L of the projected point p depends notably on the distance z. The value z is the distance of the point p relative to the pinhole. Therefore, any control algorithm that uses this form of the interaction matrix will estimate or approximate Z. A light-field camera delivers various viewing angles which can be used to roughly evaluate the distance of the objects visible in the scene. This rough estimation could be used to evaluate the Z parameter of the interaction matrix.

Photometric Visual Servoing Photometric Visual Servoing (PVS) is an IBVS control law where features are the luminance of the pixels. An estimation of the interaction matrix is required to control the motion of the actuator. The interaction matrix related to the luminance i(pt) at a time is equal to (assuming Lambertian scene): = H-(20) Where L * and L. are respectively the first and second row of the interaction matrix L.given in (28). VLand Vt. are the gradient component among x and of the gradient Vi of image!.

Images [recorded by the camera have a size of iç x N. pixels with a pixel-pitch a. The interaction matrix L. is an NJV. by 5 matrix. It is built with the reference image!.The row:[kN, --flassociated with the pixel iX?c1) is computed as follows: Lr{kN.. -f 11 fdx fdr xdx ---ydy' :idx. ( ( ______ lj (21) = ---tift---ov -1t------ax--4-V -.1: L.& L I II ii j Where dx = -Lfl/S-[(k + 1.1)16 is the derivate along k axis; and dv = .rcki. -i)/6 -r(k. ± i)!6 is the derivate along axis. (k. fl is a pixel coordinate which is converted into normalized coordinate (x. v). such as i (k/AL -1/2 7N. -1/2). The value 7 is set to the average distance of the objects visible on the image.

Basic control law At a current position of the actuator, and hence camera, image I is captured. The error e between the visual features at the current position versus the visual features at the reference position is equal to the image difference: e = r -r. The velocity of the robot is computed with equation (27). The pseudo inverse ET-of the interaction matrix L--can be computed simply as tT-= LL*. Equation (27) becomes: v = f4) 7 -r) (22) By construction, L is a Nit by 5 matrix. Thus L-is a 5 by N.Jtmatrix, and is a 6 by 6 matrix which is then inverted. With this formulation _A(L3L1.11 is computed once knowing the reference image r. The computation of LY! -r requires just 6NJ multiplications and additions which are performed for each guiding iteration. The vector v contains the motion of robot (6 values). is typically chosen to equal 1, larger values often makes the robot unstable or oscillating around the reference position. The error -1 converges to 0 as the actuator converges to the reference position. The cost function c is defined by The actuator motion is controlled by v which is updated at each iteration: guidance is performed in closed loop, a new image is acquired and a new robot velocity is computed and applied. The control algorithm is operated iteratively converging to a robot velocity substantially equal to zero.

Interaction matrix dedicated to light-field camera

Let us consider a light-field camera made of N.x N micro-lenses covering a sensor. The interaction matrix L from equation (28) needs to be updated for this light-field camera. Actually, one should derive N.x N interaction matrices: one for each micro-lens. To start with, let us define the interaction matrix of a single decentred micro-lens.

The micro-lens is decentred versus the camera coordinate system. Figure 13 illustrates the pinhole model of a decentred lens. The pinhole is located at a distance of (X. Y from the camera centre 0. The coordinate system (02xy') of the camera imaging plane is decentred by (x,y, 0) versus the camera centre.

The difference with a centred camera resides in the rotation axes which are not passing through the middle of the pinhole but through the middle of the camera a. The interaction matrix must include the decentring of the rotation axis versus the pinhole. Knowing this, the projection equation gives: x = (K -.K)JZ and = (K -)/Z. The matrix interaction L(Xy.Y:) becomes: f x x*.'* Yx Yf -E-----,------9------F 7 1 Z -7 L \,}= r f (23) ---*__.:_._.:_ * ZZ[ Z f 1 1 The light-field camera is made of N x N micro-lenses with N XN corresponding micro-images recorded at the sensor. Actually N2 interaction matrices L(YYA are computed for each decentred micro-lens where iX.1... ?) is the decentring of the micro-lens (>J) as illustrated in Figure 14. A global interaction matrix is built by concatenating all the interaction matrices L(K1.Y). becomes a 2.Nr by 6 matrix (this matrix has N more rows than the L matrix defined in equation (28) because a point P is forming N points p on the camera image to plane).

The interaction matrix L[ of a projected point p being characterized, the interaction matrix t7' of the luminance of the reference image pixels is computed as described by equation (29). A pixel of coordinate (k. 1) is located under the micro-lens (U) = ([kNt I >N/NJ) where Li denotes the integer ceiling function. A pixel of coordinate (kJ) is converted into the normalized coordinate (Ky) such as (x, i) = (kM/N.. --1/2 JCN/X. --1/2: ).

The actuator motion is controlled by vwhich is updated at each iteration: guidance is performed in closed loop, a new image is acquired and a new actuator velocity is computed and applied. The control algorithm is operated iteratively converging to a robot velocity substantially equal to zero.

It will be understood that the present invention has been described above purely by way of example, and modification of detail can be made within the scope of the invention. Each feature disclosed in the description, and (where appropriate) the claims and drawings may be provided independently or in any appropriate combination.

Claims

CLAIMS1. An apparatus comprising: a moveable actuator an imaging device coupled to said actuator; a controller adapted to receive images from said imaging device, and control movement of said actuator, wherein said controller is configured to compare a reference image with a current image captured by the imaging device for a current position of the actuator, and to control said actuator to minimise differences between the current image and the reference image; and wherein said imaging device comprises: a plurality of micro-lenses arranged in a micro-lens array, relatively to a regular square lattice, and a photo-sensor having an array of pixels, each micro-lens of the array projecting an image of a scene on an associated region of the photo-sensor forming a micro-image; wherein the micro-lens array comprises one or more micro-lens subsets, each sub-set comprising a two dimensional array of micro lenses being displaced relative to the regular lattice according to a common pattern, the common pattern defining different displacements for each micro-lens of a sub-set.
2. Apparatus according to Claim 1, wherein the controller is adapted to control the velocity of the actuator, based on the difference between the reference image and the current image.
3. Apparatus according to Claim 2, wherein the velocity of the actuator is iteratively updated by the controller for each new image acquired by said imaging device.
4. Apparatus according to Claim 2 or Claim 3, wherein the velocity is calculated as a function of the difference between the reference image and the current image, the function being determined by an interaction matrix.
5. Apparatus according to Claim 4, wherein the interaction matrix depends on the distance of on object in a captured image from the sensor, and wherein an estimate of said distance is derived from said imaging device having a microlens array.
6. Apparatus according to any preceding claim, wherein the or each sub-set comprises a square array of NxN micro-lenses.
7. Apparatus according to Claim 6 as dependent upon Claim 4, wherein said interaction matrix is a combination of NxN decentred matrices.
8. Apparatus according to Claim 6 or Claim 7 wherein said common pattern defines a plurality of possible displacements for each micro-lens, each of said plurality being equivalent in modulo N.
9. Apparatus according to any preceding claim, wherein said common pattern defines each displacement as a function of the position (i,j) of each micro-lens within the sub-set.
1O.Apparatus according to any preceding claim, wherein said common pattern defines each displacement as a function of the number of micro-lenses in each sub-set.
11.Apparatus according to any preceding claim, wherein said common pattern defines displacements in integer multiples of unit displacement vectors.
12.Apparatus according to Claim 11, wherein the magnitude (r) of said unit displacement vectors is a function of focal distance of micro-lenses.
13.Apparatus according to Claim 11 or Claim 12 wherein the magnitude of said unit displacement vectors is a function of the number of micro-lenses in each sub-set.
14. Apparatus according to any one of claims 11 to 13, wherein the multiple of said unit vectors for each micro-lens is a function of the position (i,j)of the micro-lens within the sub-set.
15. Apparatus according to any preceding claim wherein the displacement of at least one micro-lens in each sub-set is zero.
16. Apparatus according to any preceding claim, wherein the common pattern and the displacements are independent of the location of the sub-set in the micro-lens array.
17.Apparatus according to Claim 11, wherein said integer multiples (k, I) are rk(i,j) Ai+Bj+K (niodA') given by L'(i) Ci+Ej+L (modA) where NxN defines the size of the sub-set in number of micro-lenses, and the values A,B,C,E being determined as a solution of the equations: gcd(rn(n)rnodAT,AT)= I vn c[O,iV[ gcd(n2Ct+nCt(A + F) + ACE -B(modN),N)= 1 Vn e [o,zv[
18.Apparatus according to any preceding claim, wherein said common pattern defines displacements in integer multiples of unit displacement vectors and wherein the magnitude (v) of said unit displacement vectors is given by r = where f is the micro-lens focal distance, 8 is the physical size of a sensor pixel, dis the distance between the micro-lens array and the sensor and Nx)V defines the size of the sub-set in number of micro-lenses.
19.An apparatus comprising: a moveable actuator an imaging device coupled to said actuator; a controller adapted to receive images from said imaging device, and control movement of said actuator, wherein said controller is configured to compare a reference image with a current image captured by the imaging device for a current position of the actuator, and to control said actuator to minimise differences between the current image and the reference image; and wherein said imaging device comprises: a plurality of micro-lenses arranged in a micro-lens array, relatively to a regular square lattice, and a photo-sensor having an array of pixels, each micro-lens of the array projecting an image of a scene on an associated region of the photo-sensor forming a micro-image; wherein the micro-lens array comprises one or more micro-lens sub-sets, each sub-set comprising an array of NxN micro lenses, each micro-lens of the subset having a focal distance f, wherein micro-lenses of each sub-set are displaced relative to the regular lattice according to a displacement pattern, said displacement pattern defining the displacement of each micro-lens as integer multiples (k, I) of unit vectors, said unit vectors having a magnitudeVwherein the magnitude V is a function of f/N.