GB2638245A

GB2638245A - Telepresence system

Info

Publication number: GB2638245A
Application number: GB2402230.3A
Authority: GB
Inventors: Murrell Richard
Original assignee: Individual
Current assignee: Individual
Priority date: 2024-02-16
Filing date: 2024-02-16
Publication date: 2025-08-20

Abstract

A telepresence system comprising a display screen for displaying a composition image to a user, wherein the display screen comprises a plurality of display nodes, and wherein each display node corresponds to a camera of a camera array and the display nodes are spaced according to the geometry of the camera array. An eye tracking device identifies the three-dimensional location of an eye with respect to the display nodes and a controller receives a plurality of image frames, wherein each image frames corresponds with a camera point of view of a camera of a camera array and determines the location of an extrapolated point of view a distance from the rear side of a camera plane formed by the camera points of view that corresponds with the eye location. An extrapolated frame location is identified that corresponds with a light ray captured by a corresponding camera at its camera point of view wherein the light ray is aligned with a vector from the extrapolated point of view to the corresponding camera point of view. A composition image comprising a plurality of virtual centre of views arranged to correspond with the arrangement of the camera point of views is provides wherein each virtual centre of view corresponds with a camera point of view and positions each image frame on the composition image such that its extrapolated frame location is coincident with its image frame's corresponding virtual centre of view. Portions of the composition image between the virtual centre of views are interpolated and the composition image is displayed such that the virtual centre of views are co-located with corresponding display nodes.

Description

EeP: eSel Sy5Lt-.: The present invention relates to camera arrays, telepresence systems and devices, and three-dimensional effect displays.

Back2sound Use of video conferencing technologies has become commonplace for both personal and business applications. Video conferencing typically involves seeing a live video of the other participant(s), such that a face-to-face interaction is simulated. As such, facial expressions and gestures can make the interaction more realistic than voice and sound alone. However there are limitations with two-dimensional video conferencing (for example where there are no three-dimensional effects).

Telepresence devices are being considered to increase the feeling of a realistic face-to-face interaction. Telepresence solutions that involve three-dimensional effects in the video stream are desirable.

US9681096 relates to immersive augmented reality and live display wall using a camera array.

US10554928 relates to a telepresence device comprising a camera array that displays life-like images of a remote participant that are dynamically responsive in real-time to movement of the local participant, present a life-like geometry, and preserve eye gaze.

US10327014 relates to a telepresence terminal that includes a lenticular display, an image sensor, an infrared emitter, and an infrared depth sensor.

An improved telepresence system is desired.

According to an aspect there is provided a method of providing an extrapolated view. The method comprises the step of: receiving a plurality of image frames, wherein each image frame corresponds to a camera point of view of a camera of a camera array, determining a location of an extrapolated point of view a distance from the rear side of a camera plane formed by the camera points of view, identifying an extrapolated frame location on each image frame that corresponds with a light ray captured by a corresponding camera, wherein the light ray is aligned with a vector from the extrapolated point of view to the corresponding camera point of view, providing a composition image comprising a plurality of virtual centre of views arranged to correspond with the arrangement of the camera point of views; and wherein each virtual centre of view corresponds with a camera point of view; and positioning each image frame on the composition image such that its extrapolated frame location is coincident with its image frame's corresponding virtual centre of view.

The plurality of image frames may comprise one or more first image frames and a plurality of second image frames. The plurality of second image frames may be lower definition than the plurality of first image frames. A lower definition image may have a smaller file size per pixel. The number of second image frames may be greater than the number of first image frames. The step of positioning each image frame on the composition image may comprises enhancing portions of the composition image. Enhancing may enhance potions corresponding to second image frames with information from the first image frames. Enhancing may not alter the locations of features on the composition image The second image frames may be line images. The second image frames may be vector images. The second image frames may have 1-bit colour. The first image frames may have greater than 1-bit colour. Enhancing the composition image may comprise enhancing the colour of the composition image.

The plurality of image frames may further comprise a plurality of third image frames. The third image frames may be lower definition than the plurality of first image frames. The third image frames may be higher definition than the plurality of second image frames. The number of third image frames may be greater than the number of first image frames. The number of third image frames may be less than the number of second image frames. The number of second image frames may be greater than 200. The number of third image frames may be greater than 100.

The step of obtaining an extrapolated point of view may comprise identifying an eye location of a user in relation to display nodes on a display. Each display node may correspond to a (e.g. remote) camera (e.g. a camera point of view) of the camera array. The method may further comprise displaying the composition image on the display such that virtual centre of views of the composition are coincident with display nodes.

The display may be provided and/or viewed using an augmented or virtual reality headset. The display may comprise a virtual screen a distance from the user. The display nodes may be located on the virtual screen. The extrapolated point of view may correspond to an eye location relative to the position of the display nodes.

The method may further comprise capturing images of the local scene in front of the display screen using a local camera array. The method may further comprise transmitting the images to a remote telepresence display device for enabling two-way telepresence. The method may further comprise receiving geometry from a remote telepresence device comprising a separation distance between a remote display screen and a remote camera plane. The method may further comprise actuating the local display screen (or locating the local virtual screen) to be a distance from the camera plane equal to the received separation distance. A separation distance may be constant across the display screen and/or camera plane. A separation distance may vary across the display screen and/or camera plane.

After the step of positioning each image frame, the method may further comprise applying transparency to overlapping portions of image frames. The transparency may be such that the combined effect of overlapping transparencies renders the composition image opaque across the area covered by the image frames.

The method may further comprises applying an interpolation algorithm to resolve portions of the composition image between the virtual centre of views (for example after the step of positioning each image frame). Applying an interpolation algorithm may comprise identifying the same feature in a plurality of image frames. The method may further comprise positioning the feature at a single interpolated location on the composition image. The interpolation algorithm may comprise artificial intelligence and/or an artificial neural network.

The step of positioning may comprise transforming (for example warping, stretching or shearing) a portion of such image frames distal to their corresponding extrapolated frame location along radial lines extending away from the extrapolated frame location.

The step of obtaining an extrapolated point of view may comprise identifying a plurality of eye locations of one or more users with respect to the display device. The method of constructing a composition image or providing an extrapolated view may be performed for each extrapolated point of view. The display device may provide a different composition image to each eye of a user.

According to an aspect there is provided a camera array. The camera array may be suitable for capturing images according to another aspect. The camera array comprises one or more first definition cameras and a plurality of second definition cameras, and wherein the first definition cameras have higher definition than the second definition cameras. A higher definition camera may output an image with larger file size per pixel when at maximum camera settings.

The camera array may be planar. The camera array may have a height less than 3 metres and a width less than 3 metres. The number of cameras may be greater than 200. The number of second definition cameras may be greater than the number of first definition cameras.

The first definition cameras may output a file size per pixel when at maximum camera settings greater than 5 times the second definition cameras.

The camera array may further comprise a controller. The controller may be configured to convert the images from the second definition cameras into monochrome and/or line and/or vector images.

The camera array may further comprise a plurality of third definition cameras. The third definition cameras may have higher definition than the second definition cameras. The third definition cameras may have lower definition that the first definition cameras. The number of second definition cameras may be greater than the number of third definition cameras.

According to an aspect there is provided a telepresence system comprising: a display screen for displaying a composition image to a user, wherein the display screen comprises a plurality of display nodes, and wherein each display node corresponds to a camera of a camera array and the display nodes are spaced according to the geometry of the camera array; an eye tracking device for identifying the three-dimensional location of an eye with respect to the display nodes; a controller, wherein the controller is configured to: receive an eye location from the eye tracking device; receive a plurality of image frames, wherein each image frames corresponds with a camera point of view of a camera of a camera array; determine a location of an extrapolated point of view a distance from the rear side of a camera plane formed by the camera points of view that corresponds with the eye location; identify an extrapolated frame location on each image frame that corresponds with a light ray captured by a corresponding camera at its camera point of view, wherein the light ray is aligned with a vector from the extrapolated point of view to the corresponding camera point of view; provide a composition image comprising a plurality of virtual centre of views arranged to correspond with the arrangement of the camera point of views; and wherein each virtual centre of view corresponds with a camera point of view; and position each image frame on the composition image such that its extrapolated frame location is coincident with its image frame's corresponding virtual centre of view; interpolate portions of the composition image between the virtual centre of views; and display the composition image on the display screen such that the virtual centre of views are co-located with corresponding display nodes.

The telepresence system may further comprise a local camera array. The display screen may be located proximal the local camera array. Each camera of the local camera array may be positioned with respect to the display screen such that it can capture images through the display screen.

The telepresence system may comprise an augmented or virtual reality headset. The display screen may be a virtual screen positioned in the environment of the user when viewed through the headset.

The display screen may comprise holes. Each local camera may comprise a capture portion positioned within a hole. The display screen may be semi-transparent. The display screen may be moveable by actuators to increase or decrease a separation distance from the camera array.

According to an aspect there is provided two telepresence displays according to another aspect for two-way telepresence communication. According to an aspect there is provided a controller configured to perform a method of another aspect. According to an aspect there is provided a camera array or display configured to perform the method of another aspect.

An image with a lower definition (for example compared to a higher definition image) may have a smaller file size per pixel. An image with a lower definition (for example compared to a higher definition image) may have a smaller file size per angle (or comparable (e.g. angular) cone) of field of view.

An image with lower definition may be in comparison to another image if both images were of (and/or taken facing) the same scene, direction and point of view (and optionally field of view).

An image with a lower definition (for example compared to a higher definition image) may have a smaller number of pixels and/or pixel density.

An image with a lower definition may have a file size of (for example less than or between (two values)) 0.7 or 0.6 or 0.5 or 0.4 or 0.3 or 0.2 or 0.1 or less than 0.1 a higher definition image. An image with a lower definition may have a file size of (for example less than or between (two values)) 0.7 or 0.6 or 0.5 or 0.4 or 0.3 or 0.2 or 0.1 or less than 0.1 a higher definition image per pixel or per angle (or comparable angular cone) of field of view.

An image with a lower definition (for example compared to a higher definition image) may have a lower colour depth and/or a lower chrominance and/or a higher compression (e.g. ratio) and/or lower overall proportion (for example density) of high frequency image data (for example across the image). An image with a lower definition may have a reduced discretisation or gradation or proportion of shades or colour compared to a higher definition image. An image with a lower definition may have a lower discretisation or gradation or proportion of half-tone or percentage tone regions.

An image with lower definition may have a lower proportion of its file-size corresponding to higher definition image characteristics (e.g. colour information or high frequency information or gradation regions).

A camera point of view may be (or be co-located with) the nodal point of a camera.

Positioning each image frame may comprise positioning a portion of the image frame (e.g. a portion proximal the extrapolated frame location).

The second image frames and/or the third image frames may be (for example greater than and/or less than or between) 1-bit or 2-bit or 3-bit or 4-bit or 5-bit or 6-bit or 7-bit or 8-bit colour. The first image frames may have (for example greater than and/or less than or between) 8-bit or 16-bit or 24-bit or 30-bit or 36-bit or 48-bit colour.

The number of second and/or third image frames may be greater than the number of first image frames. The number of second and/or third image frames may be greater than the number of first image frames by a factor of 5 or 6 or 7 or 8 or 9 or 10 or 11 or 12 or 13 or 14 or 15 or 16 or 17 or 18 or 19 or 20 or 25 or 30 or 35 or 40 or 50 or 75 or 100 or 150 or 200 or more.

The number of image frames may be (for example greater than) 140 or 150 or 160 or 170 or 180 or 190 or 200 or 250 or 300 or 350 or 400 or 450 or 500 or 600 or 700 or 800 or 900 or 1000 or 1500 or 2000 or 2500 or 3000 or 4000 or 5000 or greater than 5000. The number of image frames may be greater than 140 and less than 150 or 160 or 170 or 180 or 190 or 200 or 250 or 300 or 350 or 400 or 450 or 500 or 600 or 700 or 800 or 900 or 1000 or 1500 or 2000 or 2500 or 3000 or 4000 or 5000.

The number of second image frames may be (for example greater than) 140 or 150 or 160 or 170 or 180 or 190 or 200 or 250 or 300 or 350 or 400 or 450 or 500 or 600 or 700 or 800 or 900 or 1000 or 1500 or 2000 or 2500 or 3000 or 4000 or 5000 or greater than 5000. The number of third image frames may be (for example greater than) 10 or 20 or 30 or 40 or 50 or 60 or 70 or 80 or 90 or 100 or 200 or 300 or 500 or 1000 or greater than 1000.

The number of first image frames may be (for example greater than or between) 1 or 2 or 3 or 4 or 5 or 6 or 7 or 8 or 9 or 10 or 12 or 15 or 20 or 25 or 30 or greater than 30.

The number of second image frames may be greater than the number of third image frames. The number of second image frames may be greater than the number of third image frames by a factor of 1.5 or 2 or 3 or 4 or 5 or 6 or 7 or 8 or 9 or 10 or greater than 10.

An extrapolated point of view may lie (e.g. substantially) within a back projected field of view of a camera of the camera array. An extrapolated point of view may be located an extrapolation separation distance from the camera plane (for example away from the camera rear side). The extrapolation separation distance may be (or correspond to) a (for example minimum or maximum) distance between adjacent cameras. The extrapolation separation distance may be (or correspond to) a distance between the closest two cameras to the extrapolated point of view. The extrapolated point of view may be a representation of a point of view of a user. An extrapolated point of view (e.g. with respect to camera point of views) may be determined from a user eye location with respect to display point of views.

Display nodes may be display point of views (e.g. as described herein). Display nodes may be moveable in relation to the display screen (for example the display device frame or screen border). Display nodes may be arranged on the display screen according to the arrangement (for example geometry and/or relative locations and/or separation distances) of camera point of views of the camera array (e.g. the remote camera array capturing images for display on the display screen) and/or composition point of views and/or composition centre of views.

The extrapolated frame location may be (or represent) the location (or pixel or nearest pixel) on an image frame that captures the part of the scene intersected by a line from the camera point of view corresponding to (for example aligned with or an extension or in the same direction as) a line (or vector) between the extrapolated point of view and the corresponding camera point of view (e.g. an extrapolated camera view). The extrapolated frame location may be (or represent) the location (or pixel) captured by a line (or light ray) at an angle to the camera direction (and/or, for example, within the camera field of view) that is an extension of a corresponding extrapolated camera view (e.g. a line (or vector) between the extrapolated point of view and the corresponding camera point of view). The extrapolated frame location may be (or represent) the location (or pixel or nearest pixel) on an image frame that corresponds with a light ray intersecting with the camera point of view, and at an angle (for example which may be zero) to the optical axis of the camera, and aligned with an extrapolated camera view. An extrapolated frame location may be determined without requiring knowledge or depth of the scene captured. An extrapolated frame location may correspond to a vector formed by extending an extrapolated camera view through the camera towards the scene.

An extrapolated frame location may be determined such that it represents a line or vector or view from a cameras point of view at an angle (including zero) to the optical (e.g. zenith) axis.

A display screen may display a composition image such that the distances (or e.g. geometry) on the display screen between virtual centre of views corresponds with the distances between corresponding camera point of views (or cameras) of a camera array (for example that captured the data for display on the display screen) (for example such the image is displayed according to the camera array geometry or such that the camera geometry is preserved). A display screen may be co-located with a camera plane (for example for two-way communication). A display screen may be offset a small distance from a camera plane (for example for two-way communication). A method may comprise sending or receiving (e.g. to or from a remote display device or telepresence device) a separation distance between a display screen (for example a virtual display screen) and a camera plane. A method may comprise locating or adjusting a (e.g. local) display screen according to the received separation distance (e.g. from the remote display screen or telepresence device).

A line image may be a vector file or vector image or vectorised image. A vector image or vector file may be constructed using algorithms or formulas (or for example a method other than defining individual pixels). A vector image may be stored in a vector file format (e.g. ".ai" or ".eps" or ".svg" or other vector file format). Advantageously line images or vector images may efficiently preserve parallax information (e.g. for efficiently reconstructing many points of views).

A line image may be comprised substantially of edges or boundaries or borders of features in the image. A line image may be monochromatic or 1-bit or be low bit colour. A line image may be only (or substantially only) comprised of white space (or a single colour) and lines. The proportion of white space (or a single colour) may be greater than 70% or 80% or 90% or 95% the black space (or the colour(s) of the lines). White space may be comprised of only wholly white pixels (or a single colour). A line image may define the edges or boundaries or borders of features in the image. Edges or boundaries or borders may be identified by feature identification algorithms or edge detection or artificial intelligence or sharper changes in contrast or colour or luminance. A line image may comprise (e.g. substantially) no regions where an intermix (for example substantially evenly or randomly distributed mix of pixels) of different colour pixels may create a half-tone or gradation. A line may be represented by a series of single pixels extending across an image plus an additional width or thickness. The width or thickness of a line may be 1 or 2 or 3 or 4 or 5 or 6 or 7 or 8 or 9 or pixels.

Enhancing may comprise extracting information or properties from a higher definition image and applying that information to similar features of lower definition portions of the composition image. Enhancing may comprise feature identification. Enhancing may comprise colorization. Enhancing may comprise transforming and inserting high frequency image portions from a high definition image frame to corresponding portions of a composition image. High definition images or cameras may contribute or enhance colour and/or definition and/or detail and/or image properties and/or sharpness to a composition image (for example a composition image created from a majority of low definition images). Enhancing may be performed before or after the image frames are positioned on the composition image. Enhancing may be performed after the step of interpolating. Enhancing may only enhance non-positional or non-parallax features or properties. Enhancing may not significantly change the locations (or parallax) of lines or features. Enhancing may not correct parallax (for example because higher definition images may not have improved parallax information (e.g. compared to a closer image frame to a virtual centre of view) but do have improved non-parallax information (e.g. colour, detail, etc)). Enhancing may be performed after first and second image frames are positioned. A composition image may comprise first and second image frames.

The step of positioning each image frame on the composition image may comprise applying transparency to each image frame distal to its extrapolated frame location. Transparency may be applied to portions of image frames that overlap with other image frames. Transparency may increase with radial distance from an extrapolated frame location. The combined transparency at a portion (e.g. a cluster of pixels) on a composition image where a plurality of image frames are overlapped may sum to render the image opaque within the portion. After transparency has been applied to portions where image frames overlap, the composition image may be opaque across all area covered by an image frame. The transparency of each image frame within a portion where such image frames overlap may be proportional to the relative distances to such image frame's extrapolated frame locations.

An interpolation algorithm may comprise feature identification and/or feature extraction. An interpolation algorithm may comprise separating a feature from an image and repositioning a feature. An interpolation algorithm may be applied to portions where image frames overlap. An interpolation algorithm may alter the locations of features on an image frame within portions where image frames overlap.

An interpolation algorithm may be applied to portions other than extrapolated frame locations (and/or regions around extrapolated frame locations or virtual centres of view). Each virtual centre of view may comprise a virtual circular region extending a radius away from the virtual centre of view. An interpolation algorithm may be applied to portions of the composition image outside of a virtual circular region.

An interpolation algorithm may extend an image beyond an image frame boundary. An interpolation algorithm may be applied to a portion of an image frame or composition image. An interpolation algorithm may apply a transformation (for example warping, stretching, shearing or feature extraction) to a portion of an image frame. An interpolation algorithm may apply one or more different transformations to one or more different portions of an image frame.

An interpolation algorithm may provide an averaging function for areas where two or more image frames overlap.

An interpolation algorithm may identify a feature in adjacent image frames that appears in two or more different places on the composition image (for example after the image frames have been positioned on the composition image). The interpolation algorithm may resolve an identified feature to a single location on the composition image. The interpolation algorithm may resolve an identified feature by transforming (for example warping, stretching or shearing) portions of corresponding image frames and/or by feature identification and/or feature extraction.

An interpolation algorithm may resolve a location of a feature by (e.g. on two or more image frames) determining and/or extending a radial line between extrapolated frame location and the location of the feature on its corresponding image frame. An interpolation algorithm may resolve a location of a feature to where radial lines cross and, for example, position the feature there. An interpolation algorithm may apply a transformation to an image frame such that a feature moves radially away from the extrapolated frame location (for example along radial lines).

The number of second or third definition cameras may be greater than the number of first definition cameras. The number of second or third definition cameras may be greater than the number of first definition cameras by a factor of 5 or 6 or 7 or 8 or 9 or 10 or 11 or 12 or 13 or 14 or 15 or 16 or 17 or 18 or 19 or 20 or 25 or 30 or 35 or 40 or 50 or 75 or 100 or 150 or 200 or more. A higher factor may increase the number of points of view and/or reduce bandwidth or processing requirements.

The camera array may have a height of (for example less than or greater than) 0.5 or 1 or 1.5 or 2 or 3 or 4 or 5 metres. The camera array may have a width of (for example less than or greater than) 0.5 or 1 or 1.5 or 2 or 3 or 4 or 5 metres. The camera array (or a substantial portion of the camera array) may have an average (or minimum or maximum) distance of (for example less than or greater than or between) 1 or 2 or 3 or 4 or 5 or 6 or 7 or 8 or 9 or 10 or 11 or 12 or 13 or 14 or 15 centimetres between adjacent (for example the nearest) camera(s) (or portions of cameras of similar type).

The number of cameras in the camera array may be (for example greater than) 140 or 150 or 160 or 170 or 180 or 190 or 200 or 250 or 300 or 350 or 400 or 450 or 500 or 600 or 700 or 800 or 900 or 1000 or 1500 or 2000 or 2500 or 3000 or 4000 or 5000 or greater than 5000. The number of cameras in the camera array may be greater than 140 and less than 150 or 160 or 170 or 180 or 190 or 200 or 250 or 300 or 350 or 400 or 450 or 500 or 600 or 700 or 800 or 900 or 1000 or 1500 or 2000 or 2500 or 3000 or 4000 or 5000.

The number of second definition cameras may be (for example greater than) 140 or 150 or 160 or 170 or 180 or 190 or 200 or 250 or 300 or 350 or 400 or 450 or 500 or 600 or 700 or 800 or 900 or 1000 or 1500 or 2000 or 2500 or 3000 or 4000 or 5000 or greater than 5000. The number of third definition cameras may be (for example greater than) 10 or 20 or 30 or 40 or 50 or 60 or 70 or 80 or 90 or 100 or 200 or 300 or 500 or 1000 or greater than 1000.

The number of first definition cameras may be 1 or greater than 1 or (for example greater than or less than or between) 2 or 3 or 4 or 5 or 6 or 7 or 8 or 9 or 10 or 12 or 15 or 20 or 25 or 30 or greater than 30.

The number of second definition cameras may be greater than (or less than) the number of third definition cameras. The number of second definition cameras may be greater than the number of third definition cameras by a factor of 1.5 or 2 or 3 or 4 or 5 or 6 or 7 or 8 or 9 or 10 or greater than 10. The number of second definition cameras may be greater than the number of third definition cameras by a factor of 20 or 30 or 40 or SO or 60 or 70 or 80 or 90 or 100 or greater than 100.

A camera array may comprise a first portion of cameras with camera directions (e.g. substantially) normal (or perpendicular) to the camera plane. A camera array may comprise a second portion of cameras with non-perpendicular camera directions with respect to the camera plane. The first portion of cameras may number greater than (or less than or between two values) a factor or 1.5 or 2 or 3 or 4 or 5 or more than 5 times the second portion of cameras.

The method may comprise receiving camera array geometry information or storing pre-determined camera array geometry information.

Coincident or co-located may refer to two features that have the same location or co-ordinates.

The controller of the display may be configured to perform any of the method steps according to an aspect.

A display screen may have holes where no pixels are located. The holes may extend through the display screen from front to back. A camera may be located in a hole such that its nodal point or lens or front of lens is (e.g. substantially) located on a plane formed by the display screen. A camera located in a hole may be located substantially behind the screen (for example a camera body may be located behind the screen) but have a lens or capture potion aligned with the hole for capturing light directed towards the hole. A camera located in a hole may be located substantially behind the screen (for example a camera body may be located behind the screen) but have a potion extending through the hole (for example the camera lens or capture portion). A camera may be located at a hole such that its lens is located aligned with the centre of the hole.

A display screen may be virtual. A display screen may be simulated in a (e.g. display of a) virtual or augmented reality system. A display screen may be viewed from a virtual or augmented reality headset (which may, for example, be glasses or contact lenses or any device worn by a user for viewing augmented or virtual reality). An eye tracking device and/or controller may be located with (for example mounted on) the headset. A virtual display screen (e.g. for displaying images from a remote camera array) may be displayed and/or located in the view of a headset such that it appears coincident with a local camera plane or a local camera array. A virtual display screen may allow the cameras to capture images "through" (e.g. through a plane where the virtual image is rendered) a display screen without interference between the display screen and the cameras. A virtual display and headset may provide for improved eye tracking or stereoscopic viewing.

A display screen may be a one-way screen. A one-way display screen may comprise a reflective layer. A one-way display screen may comprise a perforated layer. A one-way display screen may have separations between a portion of pixels such that light can pass between the pixels through the display screen. A one-way display screen may have blocking portions behind each pixel for prevent light from the pixel from travelling directly rearward. A one-way display screen may be transparent distal to or around the blocking portions.

Advantageously the embodiments described herein may be computationally efficient in producing a telepresence effect. Advantageously the embodiments described herein may provide a telepresence effect without depth information or a depth map (for example captured by a depth capture device) of the scene captured by a camera array.

The cameras in a camera array may be of similar or different types, for example different field of view, resolution, focal length, or other camera definition or characteristic. A camera array may comprise an array of lenses and/or mirrors to direct light towards one or more image capture devices. A camera array may be a light field camera array.

A camera array may be a device suitable for capturing images and/or video from a number of locations of an array. A camera may be an optical device for capturing images and/or video. A camera may be suitable for capturing images from (e.g. substantially or close to) a point in space (e.g. taking into account lens or other effects) (e.g. a nodal point). A camera may capture light inside or outside the visible spectrum of light. A camera may capture only visible light. A camera may capture visible light or infra-red light or ultraviolet light. A camera may be an image or video capture sensor or device.

A camera nodal point may be a camera's entrance pupil or "no parallax point". A camera point of view or nodal point may be located at a location with respect to the camera as determined by a pin-hole camera model or approximation. A camera point of view or nodal point may be located at substantially the location where the light rays captured by the camera converge.

A camera plane may be a plane that provides a best-fit match to the locations of the camera point of views or nodal points. A camera plane may be shaped to match the camera array. A camera plane may be bounded by the camera array. A camera plane may extend outside of the camera array. A camera plane may be a flat plane. A camera plane may be gradually curved. A camera plane may be curved in one or two or three dimensions. A camera plane may extend along (or follow a best fit curve between) lines formed between adjacent cameras (e.g. camera point of views).

Each camera of the camera array may capture an image frame. Where a plurality of image frames are obtained by the camera array, each image frame may be from a different camera. Each image frame may be taken at (e.g. substantially) the same time (or same time period (for example if staggered or similar) and/or for example to combine into an image that represents a single frame (e.g. of a video)).

Each image frame may have an image frame origin and image frame co-ordinate system (for example a two-dimensional cartesian or polar coordinate system). The image frame origin may be located at the centre or corner (or other location) of the image frame. The image frame origin may be located at a position on the image frame captured by the optical axis or camera direction of the camera.

The extrapolated frame location may be a position (e.g. single position) on the image frame according to the image frame origin and image frame co-ordinate system. The image frame coordinate system may be a count of vertical and horizontal pixels of the image frame. The image frame co-ordinate system may be based on distances, for example in relation to the composition geometry.

Identifying an extrapolated frame location may be determined by a composition model (for example the relative geometry of the composition point of view and virtual centres of view and virtual camera directions) or determined directly from the camera array (for example the relative geometry of the extrapolated point of view and camera points of view and camera directions). Identifying an extrapolated frame location based on the extrapolated point of view may comprise identifying the trigonometry between the extrapolated point of view and the camera point of views and camera orientations/directions.

An image frame may correspond with a single corresponding virtual centre of view and/or camera and/or camera point of view (for example of the camera that captured the image frame).

A camera array may have a shape corresponding to a display. A camera array may be shaped and/or curved and/or sized according to the shape and/or curve and/or size of a display (for example the display that is to display the images captured by the camera array).

The step of identifying the extrapolated frame location for each image frame may comprise applying a predetermined relationship between view angles within a cameras field of view and locations on an image frame captured by the camera. A predetermined relationship may comprise a look-up table or algorithm. The method may comprise obtaining or receiving or determining a (for example predetermined) relationship between camera view angles and locations on an image frame, for each camera of the camera array.

A display device may comprise an OLED, LCD, plasma or LED screen or projector or virtual display device (for example relating to virtual or augmented reality). A display device may be any device suitable for displaying an image (for example digital image). A display device may comprise a display screen. A display screen may comprise an array of pixels for generating an image. A display device may comprise an analogue screen. A display device may comprise a transparent, semi-transparent or one-way screen.

An eye tracking device may be mounted on or in the display device. The eye tracking device may be positioned adjacent to the display screen. The eye tracking device may be positioned with respect to the display screen such that the eye tracking device can determine the location of a (e.g. eye location of a user in three-dimensions with respect to the display nodes (e.g. the display point of views). An eye location may have three-dimensional (for example cartesian or spherical) co-ordinates with respect to the display nodes (and/or display screen or device). An eye location may comprise the approximate or precise location of a pupil or a single eye or of a pair of eyes (for example a point between the eyes). An eye tracking device may utilise feature location to identify features that can assist in the determination of eye location (for example head location or head outline or head tilt or facial features or shoulder features or bodily features or other features).

An eye tracking device may be mounted on an augmented or virtual reality headset. An eye tracking device may track or determine the location and/or orientation of a headset (for example in relation to the surrounding environment, and/or to keep the display screen in a fixed location as the eye location of the user moves).

A display and/or display screen and/or display device may have display geometry. A display device may display a composition image. The method may comprise displaying the composition image (or a portion of the composition image). A display device may display a composition image such that the composition geometry matches the display geometry. A display device may display a composition image such that the composition geometry is displayed at real-world scale from the perspective of the user. Distances between and/or relative positions of virtual centre of views may be the same as the distances and /or relative positions of display point of views.

Each display point of view (or display node) may be a location on the display device or display screen. The display geometry may comprise e.g. the locations of the display point of views. Each display point of view may correspond to a virtual centre of view. Each display point of view may correspond to a camera of the camera array. The camera array geometry may be preserved when the composition image is displayed on the display screen.

A transformation may be applied to an image frame. A transformation may comprise scaling, warping, shearing. A transformation may be applied by a controller located adjacent (for example geographically near or local) the display device and/or by a controller located adjacent (for example geographically near or local) the camera array.

A (e.g. parallax) feature may be, for example, an edge, shape or item.

An interpolation algorithm may comprise artificial intelligence (e.g. an artificial neural network).

A camera definition may refer to the features and/or image quality and/or camera characteristic of a camera. A camera definition for a camera may refer to the highest possible specification the camera may be configured in (for example with all settings set to their maximum value). A camera definition may refer to the size of images taken by the camera (for example number of pixels or file size or file size per pixel).

In a method of capturing images from a camera with a definition, a low definition image may be obtained from a high or low definition camera. A high definition camera may be configured to capture low definition images (for example images captured at the highest possible specification a low definition camera may be configured in). A low definition camera or low definition image may be black and white only (for example optionally including shades of grey).

The definitions of high and low may be relative to each other. A low definition camera or image may be less than 5 megapixels (MP), 4MP, 3MP, 2MP, 1MP, 0.5MP, 0.4MP, 0.3MP, 0.2MP, 0.1MP, or 0.05MP. A high definition camera or image may be greater than 5 megapixels (MP), 1OMP, 15MP, 20MP, 25MP. A high definition camera may be suitable for capturing 1080, 1080p, 4K, 8k or other high definition image format.

Fig. 1 shows a diagram showing an arrangement of the telepresence system in accordance with some embodiments; Fig. 2a shows a top down diagram showing an arrangement of the telepresence system in accordance with some embodiments; Fig. 2b shows a top down diagram showing an arrangement of the telepresence system in accordance with some embodiments, which includes a portion of the features of Fig. 2a relating to a composition model; Fig. 2c shows a top down diagram showing an arrangement of the telepresence system in accordance with some embodiments, which includes a portion of the features of Fig. 2a relating to a camera array and scene; Fig. 3a shows a diagram of a composition image in accordance with some embodiments; Fig. 3b shows a diagram of a composition image in accordance with some embodiments for demonstrating the positioning of image frames; Fig. 3c shows a diagram of a composition image in accordance with some embodiments; Fig. 4 shows a side view of a telepresence system with display and eye tracking device for providing a telepresence effect to a user in accordance with some embodiments; Fig. 5 shows a side view of a telepresence system for providing a telepresence effect to a pair of communicating users in accordance with some embodiments; Fig. 6a shows a top down diagram showing an arrangement of the telepresence system in accordance with some embodiments, which shows a parallax edge in the scene; Fig. 6b shows a composition image relating to the arrangement of Fig. 6a; Fig. 7a shows a top down diagrammatic arrangement of cameras in a camera array according to some embodiments; Fig. 7a shows a diagrammatic arrangement of cameras in a camera array according to some embodiments; Fig. 8a shows a diagram showing an arrangement of a camera of a camera array for capturing a scene in accordance with some embodiments; Fig. 8b shows an image frame relating to the arrangement of Fig. 8a; Fig. 9a shows a diagrammatic arrangement of a virtual or augmented reality system in accordance with some embodiments; Fig. 9b shows an alternative diagrammatic arrangement of a virtual or augmented reality system in accordance with some embodiments.

"c,:ccipti on Fig.1 shows a telepresence system 100 arranged to capture a scene 110. Fig.1 shows a camera array 120, a composition model 140, a composition image 150 and a display 180.

A camera array 120 comprises a camera plane 119 and a plurality of cameras 121. In the example of Fig. 1, sixteen cameras are shown. The number of cameras shown in camera array 120 is for illustration purposes, and a greater number or lesser number of cameras may be present in other embodiments (for example a much greater number of camera may be present in the array, for example greater than 200 or 300 or 400 or 500). In other embodiments, the cameras of camera array 120 may be arranged differently, for example different overall shape or location across the array. Each camera 121 has a camera point of view 125 and camera direction 123.

The camera plane 119 has a camera front side 115 and a camera rear side 116. The camera front side corresponds with the side facing the scene 110. The camera front side 115 corresponds with the side the cameras 121 are facing (for example oriented to capture images of). The camera rear side 116 corresponds with the side facing away from the scene 110 (e.g. the capture direction). An extrapolated point of view 130 is located in three-dimensions a distance from the camera plane 119 on the camera rear side 116. The extrapolated point of view 130 may be a theoretical or virtual or representative location. The extrapolated point of view 130 represents the desired point of view (e.g. that will be constructed) for viewing scene 110.

In various embodiments, an extrapolated point of view may be described according to the following. An extrapolated point of view may be from behind the camera array. An extrapolated point of view may be positioned on the opposite side of the camera array to the direction the (for example majority of) cameras are facing. An extrapolated point of view may be more distant from a scene than the camera array. An extrapolated point of view may have co-ordinates in three dimensions. An extrapolated point of view may be located outside of a line or plane formed between any two or three cameras of a camera array. An extrapolated point of view may lie (e.g. substantially) within a back projected field of view of a camera of the camera array. A back projected field of view may represent a cameras field of view extended backwards through its corresponding camera point of view. A back projected field of view may represent the field of view observed by a camera if it was mirrored with respect to the camera plane. A back projected field of view may extend away from a camera rear side. An extrapolated point of view may be located a separation distance from the camera plane (for example away from the camera rear side). For example, the separation distance may be (or correspond to) a (for example minimum or maximum) distance between adjacent cameras. For example, the separation distance may be (or correspond to) a distance between the closest two cameras to the extrapolated point of view.

A composition model 140 is shown with a composition plane 139 and a plurality of virtual point of views 141. Each virtual point of view 141 has a corresponding virtual camera direction 143. A composition point of view 160 is shown as well as composition vectors 161 that extend between the composition point of view 160 and each virtual point of view 141.

A composition image 150 comprises a plurality of virtual centre of views 151 arranged on it.

A display 180 is shown. The display 180 has a plurality of display point of views 181. Display point of views 181 are display nodes as described herein. The display 180 comprises a user viewpoint 182. The arrangement of the display point of views 181 on the display represents the arrangement of the camera point of views 125 (for example relative locations). The display 180 may have display geometry. The display geometry may comprise relative locations of the display point of views 181 and the user viewpoint 182. The display geometry may correspond to the camera geometry. The display point of view 181 may be a theoretical point, for example with respect to a display surface (e.g. display nodes or display point of views) or frame or border, that may be used for positioning and/or scaling an image for display (for example composition image 150). The display 180 may display composition image 150 such that the virtual centre of views 151 are (e.g. substantially) coincident with the display point of views 181. The user viewpoint 182 may represent an eye or pair of eyes of a user (for example a point between or in relation to the eyes). In other embodiments there may be a plurality of user viewpoints and corresponding composition points of view and extrapolated points of view.

The camera array 120, composition model 140 and display 180 may each be located in geographically separate locations. The composition model 140 may exist virtually for example be generated or exist or be modified by a computer or controller or cloud. The telepresence system 100 is for constructing a view (e.g. an image) from images taken by camera array 120 that when viewed on display 180 at user viewpoint 182 gives a user the impression they are viewing scene 110 from extrapolated point of view 130.

Regarding the camera array 120, each camera point of view 125 has a single camera direction 123. The camera directions 123 extend away from the camera point of views 125. The camera directions 123 represent the direction that cameras 121 are facing. The camera directions 123 may represent a characteristic of cameras 121, for example its orientation or an optical axis (e.g. of a lens) or a normal to a camera sensor. The camera directions 123 may represent the lines along which the centre of an image captured by a corresponding camera 121 lies. In Fig.1 the cameras 121 of camera array 120 all have parallel camera directions 123, however in other embodiments the camera directions 123 may be different from one another (for example different camera directions may have different orientations in relation to the camera plane 119. A camera may capture light rays at angles (including zero angle) with respect to the camera direction at the camera point of view.

It is noted that the term camera is used to refer to a single point of view. In various embodiments, a single physical camera may capture images from multiple point of views. A single physical camera or sensor may capture images from multiple point of views or nodal points using, for example, a plurality of mirrors or lenses. In such an embodiment, each point of view or nodal point may correspond to a camera as described herein.

The camera plane 119 is a theoretical representation of the shape of camera array 120. In Fig. 1 the cameras 121 are arranged according to a flat plane, and therefore camera plane 119 is also flat.

The camera array 120 has camera array geometry. The camera array geometry may define the locations and orientations of camera array 120. For example, the camera array geometry may comprise information describing the distances and relative locations of the camera point of views 125 and the extrapolated point of view 130. For example, the camera array geometry may comprise the orientations of the camera directions 123 in relation to the camera point of view 125 and/or the camera plane 119.

According to various embodiments, the cameras of a camera array may be spaced evenly or regularly. Alternatively, the cameras of a camera array may be arranged according to a non-regular shape. A non-regular camera array may optimise the camera locations to better capture the scene and/or replicate the shape of a display screen.

The camera array 120 is arranged to capture a scene 110. Scene 110 shown in Fig. 1 is purely for illustration purposes and the scene may be different (for example it may not include a person).

The composition model 140 comprises a composition plane 139, virtual point of views 141 and virtual camera directions 143.

The composition model 140 may be a virtual representation of the camera array 120 and or display 180. The composition model does not include information corresponding to the scene (for example it is not for creating a three-dimensional model of the scene). The composition model 140 comprises (e.g. virtual) composition geometry that matches or represents the camera array geometry. The composition model 140 may comprise (e.g. virtual) composition geometry that features elements of display geometry of display 180 (for example screen size or orientation or shape). Geometry may refer to (e.g. relative or absolute) locations and/or rotations and/or orientations and/or distances between features.

The composition model 140 is shown as three-dimensional in Fig. 1. In such an embodiment, the composition model 140 is determined so as to model the camera array 120, and calculations may be performed on the composition model 140. In the Fig. 1 embodiment, both the composition plane 139 and the composition image 150 are two-dimensional. The three-dimensional composition model only models the camera array (and optionally the display) and extrapolated point of view.

In various embodiments, a composition image may be generated from the composition model. The composition image and virtual centre of views may be determined from the composition plane 139 and the virtual point of views. A composition model or composition image may be for combining images from a camera array. A composition image may be a digital image or part of a series of images or video. A composition model may be for translating data from the camera array into a composition image for display to a user.

In other embodiments, the composition plane may be three-dimensional (for example curved), such that it matches the shape of a similarly shaped (for example curved) camera array and/or similarly shaped (for example curved) display. If the composition plane is three-dimensional (for example curved), a two-dimensional composition image may be determined from it by "flattening out" the composition plane. When a "flattened out" two-dimensional composition image is displayed on a three-dimensional display screen (for example curved) it is transformed back into the shape of the composition plane from the point of view of a user.

Calculations may be performed on data from the camera array 120 instead of determining a three-dimensional composition model 140. For example such calculations may allow the determination of parameters to position the image frames without requiring a three-dimensional model. However, both approaches represent methods for performing analysis of the camera array 120 (for example its features and geometry). The composition model 140 shown in Fig. 1 may be a representation of the calculations performed by a computer or controller on camera array 120 in order to arrange the composition image 150 (for example position image frames).

In the Fig. 1 example the camera plane 119, composition plane 139 and display 180 are the same shape (e.g. flat). In other embodiments the camera plane, composition plane and display (for example the shape of a display screen of the display) may each have the same non-flat (e.g. curved or part-spherical) shape. The camera array may be shaped to match a display intended for displaying the telepresence effect. If the display has a more complicated shape, for example an unusual shape (e.g. different from a traditional display) to represent e.g. a cockpit window (or other unusual shape), a camera array may be constructed according to such unusual shape.

Each virtual point of view 141 is a virtual representation of a camera point of view 125. Each virtual point of view 141 has the same location as a single camera point of view 125. The virtual centre of views 151 have the same geometry and layout on the composition image 150 as the virtual point of views 141 on the composition plane 139.

In embodiments where the composition plane is three-dimensional (e.g. curved or non-planar), the composition plane may have a two-dimensional co-ordinate system across the plane. The virtual point of views may be located on the composition plane by the two-dimensional co-ordinates. The virtual centre of views on the (e.g. "flattened out") composition image may be arranged according to the virtual point of views defined by the two-dimensional co-ordinates.

The composition model 140 has a composition point of view 160 that models the extrapolated point of view 130. The composition point of view 160 is positioned in relation to the virtual point of views 141 the same as the extrapolated point of view 130 is positioned in relation to the camera point of views 125. The composition model 140 has composition vectors 161 extending between the composition point of view 160 and each virtual point of view 141. The composition vectors 161 represent the extrapolated camera views 131.

The arrangement of Fig.1 may be used to determine an angular and positional relationship between the extrapolated point of view 130 and the cameras 121 of the camera array 120. For example a positional relationship may be determined between the extrapolated point of view 130 and each camera point of view 125. For example an angular relationship may be determined between each extrapolated camera view 131 and each camera direction 123 at each corresponding camera point of view 125 (for example where a line coincident with a camera direction 123 intersects with an extrapolated camera view 131). An angular relationship may be in three dimensions (e.g. angular spherical co-ordinates). An angular relationship or angular spherical co-ordinates may comprise a zenith direction and polar angle and azimuthal angle (and for example where radial distance or separation is not required). The angular relationship may comprise more than one angle (for example an angle and a rotation, or polar angle and azimuthal angle). Each angular relationship may be determined from the orientation of a camera direction 123 with respect to the camera plane 119 and the orientation of an extrapolated camera view in relation to the camera plane 119 at each camera point of view 125.

The determination of an angular and positional relationship between the extrapolated point of view 130 and the cameras 121 may be, for example, determined using the camera array geometry directly or by constructing a composition model 140.

The arrangement shown in Fig. 1 is for extrapolating a point of view and subsequently providing a telepresence experience (which may include a virtual telepresence experience). As described herein, there are relationships between various components. The better these are aligned and/or modelled and/or simulated and/or replicated and/or measured and/or determined, the better the telepresence effect. However exact alignment may not be possible, for example due to tolerances (for example tolerances in mounting the cameras). If exact alignment is not possible the telepresence effect may not be as realistic, however this may not be noticeable, may be corrected, or may be

acceptable.

Fig. 2a, 2b and 2c show top-down views of a telepresence system 200 similar to the arrangement of Fig. 1. Fig. 2a shows both a composition model 240 and a camera array 220. The composition model 240 is shown diagrammatically virtually co-located with the camera array 220 to show how composition model geometry may match camera array geometry. Fig. 2b shows just the features relating to the composition model 240. Fig. 2c shows just the features relating to the camera array 220 and scene 210.

Fig. 2a shows a camera array 220 including a camera plane 219 and a plurality of cameras 221. Each camera 221 has a camera point of view 225 that lies on the camera plane 219. Each camera 221 has a camera direction 223 that extends from the camera point of view 225 and represents the orientation of the camera. The camera array 220 is arranged to capture images or video of a scene 210. The camera array 220 has a camera front side 215 and a camera rear side 216. An extrapolated point of view is located distal to the camera plane 219 on the camera rear side 216.

Extrapolated camera views 231 are lines or vectors that extend between the extrapolated point of view 259 and the camera points of view 225. Extrapolated view angles 227 exist between the extrapolated camera views 231 and the camera directions 223.

A composition model 240 comprises a composition plane 239. The composition model 240 comprises a plurality of virtual points of view 241 positioned on the composition plane 239. A plurality of virtual camera directions 243 extend from corresponding virtual point of view 241.

A composition point of view 260 is located distal to the composition plane 239 (for example distal in a direction normal to the composition plane 239). Composition vectors 261 are lines that extend between the composition point of view 260 and the virtual points of view 241. Composition view angles 245 exist between the composition vectors 261 and the virtual camera directions 243.

The diagram of Fig. 2a is a top down view and as such it will be appreciated that the features shown represent one row of a camera array. In the Fig. 2a example the composition plane 239 and camera plane 219 are flat planes that extend into the figure. The composition point of view 260 or extrapolated point of view 259 may not be at the same vertical location as the cameras 221 visible in Fig. 2a. The composition view angles 245 may have components into or out of the page. The composition view angles 245 may define three-dimensional orientations or spherical angles. Such orientations may have components that define an angular relationship from a point on a plane and a zenith axis to a line extending away from the point.

Fig. 2b and 2c each show portions of the features shown in Fig. 2a. For example Fig. 2b shows the features of Fig. 2a relating to the composition model 240 and composition point of view 260 (labelled as 240' and 260' respectively in Fig. 2b).

Fig. 2a shows many features co-located in the diagram. The camera plane 219 and composition plane 239 are co-located in the diagram. Each virtual points of view 241 is co-located with a corresponding camera 221. Each virtual camera direction 243 is co-located with a corresponding camera direction 223. Each virtual camera direction 243 is aligned in the same direction to a corresponding camera direction 223.

The colocation of features in Fig. 2a is shown to demonstrate how the camera geometry (the geometry of the camera array 220,220' is replicated by the composition geometry (the geometry of the composition model 240,240'). The colocation of features in Fig. 2a may illustrate the composition model 240 being defined in virtual space with the same (or substantially the same) co-ordinates and/or size and/or shape as the camera array 220. For example, all the features shown in Fig. 2b may be virtual features (for example created and/or existing in a controller or computer memory). For example, all the features shown in Fig. 2c may exist in the real world (however for example the camera directions 223,223' and camera point of views 225,225' may be derived or approximated features of the physical cameras 221,221').

The composition vectors 261 are shown coincident with the extrapolated camera views 231. In the Fig. 2a and Fig. 2c examples, the extrapolated camera views 231' are shown extending beyond the camera plane 219' into the scene 210'. As such the extrapolated camera views 231' intersect with portions of the scene 210'. The portions of extrapolated camera views 231' extending beyond the camera plane 219' represent light rays or vectors that enter cameras 221' (at their nodal point and point of view), that, if extended beyond (for example behind) the cameras 221' and camera plane 219', would converge on the extrapolated point of view 259'. Each camera 221' captures one light ray (for example a light ray at a single orientation defined by an angular spherical co-ordinate system centred on a camera point of view) that is aligned with an extrapolated camera view 231'. This relationship or effect is true regardless of the scene (for example of the depth of the scene). A location on an image taken by a camera (for example a two-dimensional location with respect to the image pixels or image corner) may correspond to a single orientation defined by an angular spherical co-ordinate system (for example the combination of polar and azimuthal angles but not radial distance) centred on a camera point of view and in relation to the camera direction. This relationship or effect is true regardless of the scene (for example of the depth of the scene).

The parts of images taken by cameras 221 aligned with the extrapolated rays of light will be substantially "correct" (for example they may have the correct parallax and/or depth) from the extrapolated point of view 259. Parts of such images may become less "correct" the further the image location is from the "correct" part. This effect has been found to apply to extrapolated points of view. This effect does not exist when constructing interpolated points of view (for example constructing a point of view between cameras). The "correct" parts of images may be identified using, for example, (e.g. only) positional and angular information relating to the camera point of views, camera directions and extrapolated points of view. Depth information for the scene is not required.

Fig. 3a, Fig. 3b and Fig. 3c show composition images. The composition image of Fig. 3a, Fig. 3b and/or Fig. 3c may, for example, be the same as the composition images (140 and/or 240) shown in Fig. 1 or Fig. 2a.

Fig. 3a shows a composition image 339. Arranged on the composition image 339 are a plurality of virtual centre of views 341. In Fig. 3a there are sixteen virtual centre of views 341, however in other embodiments there may be a greater or fewer number and/or arranged differently. In Fig. 3a, each virtual centre of view 341 has a corresponding image frame 371. In Fig. 3a, each image frame has an extrapolated frame location 351. In Fig. 3a, each extrapolated frame location 351 is located within the boundary of its corresponding image frame 371. However in other embodiments an extrapolated frame location may be located with respect to its corresponding image frame 371 but outside of the boundary of the corresponding image frame 371. In Fig. 3a, each image frame 371 is centred on its corresponding virtual centre of view 341. A frame translation vector 331 is between each extrapolated frame location 351 and each corresponding virtual centre of view 341.

Fig. 3b shows the features of Fig. 3a in a different arrangement. In Fig. 3b, each image frame 371' is positioned on composition image 339' such that its extrapolated frame location 351' is located on (or coincident with) the corresponding virtual centre of view 341'.

In Fig. 3b, each virtual centre of view is in the same location on the composition image as in Fig. 3a. In Fig. 3b, each extrapolated frame location is in the same location in relation to its corresponding image frame as in Fig.3a. In Fig. 3b each image frame 371' and its respective extrapolated frame location 351' have been moved jointly with respect to their positions in Fig. 3a. In Fig. 3b each image frame 371' and its respective extrapolated frame location 351' have been moved together by a direction and distance defined by its corresponding frame translation vector 331 compared to Fig. 3a. In Fig. 3a each image frame 371 is centred on its corresponding virtual centre of view 341. In Fig. 3b each image frame 371' is located such that its extrapolated frame location 351' is co-located with the virtual centre of view 341'. The arrangement of Fig. 3a may be for illustration or descriptive purposes. In a method of arranging image frames on a composition image the image frames do not need to be first arranged in the arrangement of Fig. 3a before being translated into the arrangement of Fig. 3b. In a method of arranging image frames on a composition image, image frames may be located on the composition directly in accordance with Fig. 3b.

The image frames shown in Fig. 3a and Fig. 3b are sized for illustration purposes. Image frames may be larger or smaller, for example in comparison to the size of the composition image. The size of image frames may vary across the composition image (for example if certain cameras have a narrower field of view or definition or resolution or alternative parameter that causes a smaller or larger image frame size). A larger image frame size may have more pixels and/or larger area and/or larger width and/or height.

Fig. 3c shows the same arrangement as Fig. 3b but with larger frame sizes. For example image frames 471 are larger than image frames 371'. Composition image 339" is the same size as composition image 339' and virtual centre of views 341" are located in the same locations virtual centre of views 341'. The extrapolated frame locations 451 have been scaled with the image frames 471 such that their relative location on the image frames 471 is the same as the relative locations of extrapolated frame locations 351' on image frames 371'.

In Fig. 3c the image frames overlap once placed or located on the composition image. Such an arrangement may ensure that portions of the composition image or composition plane that are not covered by an image frame are minimised (for example in area). The image frames of Fig. 3a and Fig. 3b have been made smaller for illustration purposes (for example to explain or observe the locations of image frames, which for example may be harder to observe in a patent diagram if they were larger).

In Fig. 3a, 3b and 3c, the arrangement of the composition image is such that the locations and arrangement of the virtual centre of views correspond with the camera point of views of a camera array. The image frames may be images (or part of video) captured by the cameras corresponding to the corresponding virtual centre of views. If the image frames are part of a video, each frame of the video may be arranged in accordance with the arrangement of Fig. 3b or 3c.

As an extrapolated point of view moved or changes over time, the extrapolated frame locations will change on their corresponding image frame. The image frames may move on the composition model (for example over time, or as the extrapolated point of view moves or changes) to maintain the arrangement such that their (for example new) extrapolated frame location is co-located with the (for example static) virtual centre of view.

The following method may be described in reference to the arrangements shown in Fig. 1, 2a and 3c, which may be considered a single embodiment.

A plurality of cameras 121 are arranged on a camera array 120 to capture a scene 110. A camera plane 119 is defined such that it is a best fit to the camera point of views 125. The cameras 121 may be adjusted such that their camera point of views 125 lie on a (for example flat) camera plane 119. The camera directions 123 are recorded.

An extrapolated point of view 130,259 is desired to be constructed from the camera array 120,220. A composition model 140,240 is created or assembled or adjusted such that it has a plurality of virtual points of view e.g. 241 and virtual camera directions e.g. 243. In this embodiment there is one virtual point of view 241 per camera 221 in the camera array 220. The composition model 240 may be created virtually, for example in a controller or computer. The features of the composition model 240 are assigned geometry and/or locations and/or co-ordinates and/or relative distances such that they are arranged to (for example substantially) represent the camera array. The camera array 220 takes a plurality of image frames 471. For example each camera 221 of the camera array 220 may take a corresponding image frame 471, for example at the same (or substantially the same) time. The images may form part of a video. The image frames captured by the camera array 220 according to the present embodiment are of sufficient size (e.g. field of view) that they overlap when placed or located or positioned on the composition image 339".

A composition point of view 260 is located virtually with respect to the composition model 240 (e.g. virtual point of views and composition plane) such that it represents the location of extrapolated point of view 259 with respect to the camera array 220 (e.g. camera point of views and camera plane). A plurality of angular relationships 245 are determined (for example one per virtual point of view) between the extrapolated point of view 260 and the virtual camera directions 243. The angular relationships are used to determine an extrapolated frame location 451 on each image frame 471 that represents the corresponding extrapolated camera view 231'. Each image frame 471 may be scaled and/or warped and/or transformed (for example according to a pre-determined amount or as a result of features in the image). Each image frame 471 is then positioned on the composition image 339" such that its extrapolated frame location (for example 451) is located on the corresponding virtual centre of view 341" of the composition image 339".

Positioning each image frame may involve code or an algorithm or a controller to determine the location of the image frame. Positioning each image frame may comprise importing or pasting or combining or rendering each image frame digital file into the composition image digital file. Each image frame may be imported or pasted or combined or rendered such that the extrapolated frame location is located on the corresponding virtual centre of view.

In other embodiments the angular relationships (extrapolated view angles) 227 are used (instead of angular relationships 245) calculated from the camera array 220. Angular relationships 227 may be substantially the same as angular relationships 245. In such embodiments a composition model is not required as all calculations are performed directly on camera geometry data (although, for example, the calculations may be representative of a composition model).

The image frames overlap in the current step of present embodiment. The image frames may be of sufficient size that an image frame obscures the virtual centre of view of a neighbouring virtual centre of view. The image frames may be sized such that three or more image frames overlap once positioned or located on the composition plane. There are several ways to resolve the portions of image frames that overlap and/or are distal to an extrapolated frame location and/or form parts of the composition image distal to virtual centre of views. This may involve, for example, introducing transparency or applying an interpolation algorithm (which may for example comprise artificial intelligence and/or an artificial neural network).

Transparency is applied to image frames 471 for portions distal to the extrapolated frame locations 451. The transparency is applied to portions of (or locations on) the image frames that overlap with other image frames. The transparency applied to portions of (or locations on) image frames is such that the combined transparency is 0% (i.e. opaque or a complete image). For example if three image frames overlap at a portion of the composition image, one image frame may have a transparency of 80%, one 70% and one 50% such that when combined the transparency is 0%. For example the portions may have opaqueness of 20%, 30% and 50% resulting in a 100% opaque image portion. The transparency of each image within the portion is determined by the portion's location to the corresponding extrapolated frame location of the image frame. For example if one extrapolated frame location is twice as far as another from the portion, the portion in the corresponding image frame will have double the transparency of the other. The resulting composition image 339" may be (for example substantially) complete or not transparent or interpolated or opaque across the area covered by the image frames 471. Transparency may provide an improved composition image for low processing cost. Transparency may provide an improved composition image where a large number or large density of image frames are present. In other embodiments an interpolation algorithm may be used in combination with or instead of the described transparency.

The extrapolated frame locations 451 may move or change location (for example over time) with respect to their corresponding image frames 471, in relation to a move or change in location of a corresponding extrapolated point of view. The locations of the image frames 471 with respect to the composition image 339" or virtual centres of view 341" may change (for example over time) as the corresponding extrapolated frame locations 451 move or change location within their corresponding image frames 471.

The resulting composition image 339", when displayed on display 180 and viewed from user viewpoint 182, will make the scene 110 appear as if viewed from extrapolated point of view 130. The displayed image may be more correct or have improved telepresence effect (for example with correct parallax and depth) at (for example substantially at) or near/proximal the virtual centre of views and/or display point of views (for example without interpolation). With an increasing number of cameras in the camera array the telepresence effect may be improved and/or the required interpolated reduced. This is achieved without requiring depth information from the scene.

The method described may be repeated for each time frame of a video captured by a camera array. The extrapolated point of view may change over time. The resulting composition image 339" may be used for or by a computer program or controller or simulation of the scene or virtual or augmented experience. The resulting composition image 339" may be displayed on a display screen (for example LCD or LED or plasma or projector or virtually (e.g. appearing at a position in space ahead of the viewer) or through a virtual/augmented reality headset) to a user (for example where the extrapolated point of view represents an eye location of the user with respect to the display nodes of the display screen).

Fig. 4 is a side view showing a slice of a telepresence system. The telepresence system comprises a telepresence display system 499 and telepresence capture system 498. The telepresence display system 499 comprises a display 491 with a display screen 492. Arranged on the display screen are a plurality of display point of views 493. The telepresence display system 499 has an eye tracking device 481. In the Fig. 4 example the telepresence display system 499 also has a controller 505. The eye tracking device 481 is for identifying the location of extrapolated point of view 429. In the Fig. 4 example the extrapolated point of view is co-located with an eye location 428 of a user.

A composition model 438 is shown diagrammatically coincident with display 491. The composition model 438 shown comprises a composition plane 439 coincident with the display screen 492. A plurality of virtual points of view 441 are arranged on the composition plane 439. Each virtual point of view 441 has a corresponding virtual camera direction 443.

The telepresence capture system 498 comprises a camera array 520 and controller 504. The camera array 520 comprises a plurality of cameras 521. Each camera 521 has a camera point of view 525 and camera direction 523. The camera array 520 is arranged to capture images and/or video of a scene 510.

The composition plane 439, virtual points of view 441 and virtual camera directions 443 correspond to the camera plane 519, camera point of views 525 and camera directions 523 as described previously (for example in relation to Fig.1 or 2a). The virtual point of view 441 have the same geometry (for example are coincident) with the display point of views 493. Display point of views 493 are display nodes and are theoretical points defined on the display screen 492 for locating the composition image on the display screen 492. In the Fig. 4 example, the composition image is defined by the composition plane 439, and the composition image has virtual centre of views coincident with the virtual point of views 441.

The camera point of views are the same geometry as the display point of views, and the composition point of views are defined to substantially match or replicate both. In other embodiments display point of views may have different geometry to the camera point of views, however the view observed by the user may be scaled or have non-realistic geometry.

The eye tracking device 481 is located above (for example vertically above) the display screen 492. The eye tracking device 481 may be mounted on top of (for example on a top surface of) the display 491. In other embodiments the eye tracking device 481 may be located in other locations (for example to the side or below or within the display 492). The eye tracking device 481 is for determining the eye location 428 of a user. The eye location 428 has geometry with respect to the display point of views 493. For example the eye location may have cartesian co-ordinates (e.g. x, y and z components) or spherical co-ordinates (for example angular spherical co-ordinates and distance). The eye location 428 may be determined in three-dimensional space with respect to a point on the display 491 (e.g. in addition to the display point of views).

In other embodiments the eye tracking device (e.g. equivalent to 481) may track the location of a headset. In other embodiments the eye tracking device (e.g. equivalent to 481) and/or controller (e.g. equivalent to 505) may be mounted in a headset. In such an embodiment the display (e.g. equivalent to 491) and/or display screen (e.g. equivalent to 492) may be virtual components (for example projected in the view of a headset). If mounted in a headset, the eye tracking device 481 may comprise gyroscopes and/or a configuration to determine the location of the device and/or eyes of the user (for example with respect to the projected display screen or surrounding location).

The eye location 428 is an extrapolated point of view 429. The extrapolated point of view 429 is a point of view from which an image (for example a composition image) is to be constructed from the information captured by the camera array 520. The extrapolated point of view 429 may be determined at (for example substantially at) a single eye location or between a pair of eyes. The eye tracking device 481 may track more than one eye location. For example the eye tracking device 481 may track two eye locations (representing a pair of eyes). When two or more eye locations are tracked, two or more composition images may be generated by the method described herein (for example one composition image per eye location). As such the display device may display two or more composition images (for example one for each eye of each user) to create a stereoscopic (or 3D) effect to one or more users.

The composition plane 439, which exists in virtual space, is shown located such that its virtual point of views 441 are coincident with the display point of views 493 of the display screen 492. Display point of views 493 may have fixed or moveable position in relation to the display screen 492 or camera point of views. The display nodes will generally be fixed during a viewing experience, but may move to change the location and extrapolated point of view of/to the displayed scene. The geometry (e.g. distances between and/or relative locations) of virtual point of views 441 (or virtual centre of views of a composition image) are the same as the geometry of the display point of views 493. Such an arrangement is for ensuring that when a corresponding composition image (for example determined by a method as described herein) is displayed on the display screen 492, that when the user observes a display point of view 493 they view a location that represents a location of a camera 521.

Controller 504 receives data from the camera array 520. Controller 504 sends the data along the data transmission line 506 to controller 505. The data transmission line 506 may for example be a cable, wire, Bluetooth, wi-fi or internet connection. The data transmission line 506 may be any device suitable for transmitting data from one controller to another. Controller 505 receives data from controller 504. Controller 505 receives data from eye tracking device 481. Controller 505 sends images (for example composition images) to the display 491. Controller 505 or 504 may be pre-programmed with dimensional information relating to the display and/or camera array. Controller 505 or 504 may receive dimensional information relating to the display and/or camera array.

The method of creating a composition image may be performed by controller 505. Alternatively, portions of the method may be performed by controller 504, for example if the eye location 428 or extrapolated point of view 429 are sent to controller 504 by controller 505. Controller 504 or 505 may perform de-warping, straightening, de-shearing or other transformations on the images from camera array 520. Controller 504 or 505 may perform transformations or selections on portions of images from camera array 520 (for example in relation to the current received extrapolated point of view 429 received from controller 505). Controller 504 or 505 may apply compression to a plurality of image frames. Controller 504 or 505 may apply different levels of compression to different groups of image frames.

Fig. 4 shows a telepresence arrangement that is one-way. For example there is one camera array generating images or video for one or more distal displays. Such an arrangement can be used by a user to view a remote location with a "3D window" effect. The images and/or video recorded by camera array 520 may be displayed live on display 491. Alternatively, the images and/or video recorded by camera array 520 may be captured at an earlier time for display at a later time on display 491 (i.e. the camera array pre-records images or video). When the displayed content is not live (for example it is pre-recorded), the data transmission line 506 may be removed, and controller 504 may generate a recording. The recording may be transferred to controller 505 for playback by any suitable means (for example a data transmission line or a recording on a physical device).

In other embodiments, the camera array 520 may provide captured images and/or video to a plurality of telepresence display systems (for example similar to telepresence display system 499 comprising a display, eye tracking device and controller). Then, many people can view the scene from their devices with a "3D window" effect.

The camera array for a one-way "3D window" effect or recording or broadcast may be higher definition and/or may have a greater number of cameras and/or be more expensive compared to prior art systems that are related to (e.g. live) two-way telepresence communication.

The one-way "3D window" arrangement, for example as shown in Fig. 4, may show live concerts or sporting events (or any type of live media content). The one-way "3D window" arrangement, for example as shown in Fig. 4, may show pre-recorded nature documentaries or travel shows (or any type of pre-recorded media content). With the telepresence or "3D window" effect being broadcast, there may be a delay in communication from the camera array (similar to a small delay in traditional live television). A delay in a live broadcast, or pre-recorded media, allows a controller on the camera array side (for example controller 504) (or device side ( for example controller 505)) to perform more complex operations or processing or post-processing (for example at a later stage in a production centre) on the media content. Such complex operations may optimise the video stream for data transmission or optimise the images for playback. Such complex operations may comprise more time consuming compression of one or more (for example groups) of images or video (for example to optimise images or convert to line images (for example vector images)).

Fig. 5 shows a similar arrangement to Fig. 4 but comprises two telepresence systems for providing a two-way telepresence effect.

Fig. 5 shows a first telepresence system 699a comprising a display 691a with a display screen 692a.

An eye tracking device 681a is for identifying the location of an extrapolated point of view 629a and/or eye location 628a. A camera array 620a comprises a plurality of cameras 621a. Each camera 621a has a camera point of view 625a. The camera array 620a is arranged to capture images and/or video of a scene 610a. A plurality of display points of view 641a are arranged on the display screen 692a. The plurality of display point of views 641a are arranged to correspond to the camera point of views 625a.

A second telepresence system 699b comprising a display 691b with a display screen 692b. An eye tracking device 681b is for identifying the location of an extrapolated point of view 629b and/or eye location 628b. A camera array 620b comprises a plurality of cameras 621b. Each camera 621b has a camera point of view 625b. The camera array 620b is arranged to capture images and/or video of a scene 61013. A plurality of display points of view 641b are arranged on the display screen 692b. The plurality of display point of views 641b are arranged to correspond to the camera point of views 625b.

Each telepresence system (699a, 699b) may be according to the embodiment shown in Fig. 4. Each telepresence system (699a, 699b) may have additional features shown in Fig. 4 relating to telepresence system 499.

The camera array 620b and the display 691a may be considered local to a user with eye location 628a (for example and remote to a user with eye location 628b). The camera array 620a and the display 691b may be considered remote to a user with eye location 628a (and local to a user with eye location 628b).

In the Fig. 5 example the camera planes (e.g. 619b) are co-located with the display screens (e.g. 692a). For example the display screen is located on the same plane as the camera plane. Such an arrangement may represents the nodal points or camera points of view 625b being located in small holes in the display screen 692a (or in other embodiments being located coincident with a virtual display screen).

In other embodiments the cameras plane 619b may be positioned just behind (for example substantially behind or with a small gap) the display screen 692a (for example with respect to the user viewing the display). If the cameras 621b are located just behind the display screen 692a, the display screen 692a may be semi-transparent such that the cameras 621b can see through the display screen 692a. Locating displays screens on camera planes (e.g. coincident with or on the same plane as) provides improved integration with other telepresence devices. For example if two communicating devices have different separation between display screen and camera plane the perspective may be worsened or for example the other user's eyes may not appear to focus at the correct depth.

As such in other embodiments two communicating telepresence systems may send or receive (via a controller(s) equivalent to 504 or 505) the separation distance between a display screen (which may be a virtual display screen) and a camera plane (e.g. of camera nodal points). A controller may adjust or move or translate a local display screen (for example actuate or move with actuators), or locate a local virtual display screen, to be a separation distance from a local camera plane to match the remote telepresence system (e.g. according to the received information).

In other embodiments the display screen may be a transparent LED or LCD display. The display screen 692a may be a transparent material onto which a projection is projected.

In other embodiments display 691a and/or display screen 692a may be a virtual display observed by a virtual or augmented reality system. In other embodiments a virtual display may comprise only a display screen (e.g. equivalent to 692a). A virtual display screen may hover in the air (for example at a fixed location in relation to the virtual or real environment the user is in) when viewed through a headset.

A virtual or augmented reality system may more easily provide different images to each eye, for example by specialised glasses or headset, for example to provide a stereoscopic effect. If an augmented or virtual reality glasses or headset is utilised, tracking eye location may be performed more easily by tracking the glasses or headset.

Whilst the telepresence systems shown in Fig. 5 comprise features located with both users, in other embodiments a telepresence system may be considered to be just the features located with one user. In other embodiments a telepresence system may comprise features corresponding to display 691a, (optionally) camera array 620b, eye tracking device 681a and (optionally) a controller.

In Fig. 5 camera array 620a and camera array 620b have different geometry (for example different camera arrangements and/or relative locations). As such, the display point of views (e.g. 641a) on a display are located at different points to the camera point of views corresponding to that display (e.g. 625b). The display point of views (e.g. 641a) are arranged to match a camera array corresponding to the image it is displaying (e.g. 625a).

Fig. 6a and Fig. 6b show an interpolation method. Fig. 6a shows a similar top down arrangement to Fig. 2a. Fig. 6a shows a camera array 720 including a camera plane 719 and a plurality of cameras 721. Each camera 721 has a camera point of view 723 that lies on the camera plane 719. Each camera 721 has a camera direction 725 that extends from the camera point of view 723 and represents the orientation of the camera. The camera array 720 is arranged to captures images or video of a scene 710.

An extrapolated point of view 760 is located distal to the camera array 720. Extrapolated camera vectors 761 are lines that extend between the extrapolated point of view 760 and the camera point of views 723.

Additionally, Fig. 6a shows a scene feature 793 that has a parallax edge 794. Camera ray 791 exists between camera 721 and the parallax edge 794. Camera ray 792 (which may be referred to as a light ray) exists between camera 722 and parallax edge 794.

A parallax edge may, for example, be a corner or edge or border of an object or feature, as shown in Fig. 6a. A parallax edge may, for example, be an identifiable line in the scene or an image. A parallax edge may, for example, be an edge of high contrast or boundary of an identifiable area in the scene or image.

Fig. 6b shows a similar arrangement to the diagram of Fig. 3c. A composition image 840 has a plurality of virtual centres of view (e.g. 841, 842, 843). Three image frames (871, 872, 873) are shown, each with a corresponding extrapolated frame location (851, 852, 853). The image frames (871, 872, 873) have been located such that their extrapolated frame locations (851, 852, 853) are co-located with their corresponding virtual centre of view (841, 842, 843). Whilst only three image frames are shown in Fig. 6b, this is for illustration purposes, and in other embodiments there may be more image frames (for example each virtual centre of view may have a corresponding image frame).

Fig. 6b shows the locations of the parallax edge in each image frame. Parallax edge 811 corresponds to image frame 871. This represents the location of parallax edge 794 on an image taken by camera 721. Parallax edge 812 corresponds with image frame 872. This represents the location of parallax edge 794 on an image taken by camera 722. Parallax edge 813 corresponds with image frame 873. This represents the location of parallax edge 794 on an image taken by a camera (substantially) vertically above camera 722 (i.e. out of the page in Fig. 6a). Parallax edge 814 represents the location of the parallax edge if viewed from extrapolated point of view 760 (or on an image taken by a camera at extrapolated point of view 760).

After the image frames (e.g. 871,872,873) are positioned on the composition image 840 (e.g. in relation to the centre of views) the different image frames show the parallax edge 794 in different locations (811,812,813) on the composition image. This is the effect described earlier whereby the further the distance on the composition image from a virtual centre of view, the more parallax issues occur (i.e. the resulting image will be less correct). Only locations on a composition image at or near a virtual centre of view (after location of the image frame) are substantially correct (e.g. parallax issues are minimised and depth is substantially correct). Interpolation may be required to resolve the location of parallax edge (811,812,813).

As described previously, transparency may be selectively applied to image frames. However this may result in blur in the composition image. Interpolation required and/or blur and/or parallax issues may be minimised by increasing the number or density of images from (or cameras in) the camera array. In other embodiments a large number of low definition cameras may be used to minimise parallax issues in a composition image (for example, the composition image may be enhanced subsequently using a lower number of high definition cameras).

An interpolation algorithm may use feature identification. Edges (for example parallax edges) may be identified in a plurality of image frames. Where the same feature and/or edge is identified in two or more image frames its location may be resolved. For example, a feature identification algorithm may identify parallax edges 811 and/or 812 and/or 813. These locations can then be resolved onto parallax edge location 814. For example an averaging method or similar may be used, taking into account distance from a virtual centre of view. A feature (e.g. 811, 812, 813) identified in two or more image frames may be moved along image radial lines (e.g. 815) until agreement is reached between the two or more image frames. The two or more image frames may then be selectively modified (for example a portion of the image is stretched or warped) to ensure correct location of the identified feature (and other identified features). Alternatively the feature may be isolated and/or extracted and moved. Any such modification may not be applied to parts of the image frames at or near an extrapolated frame location and/or virtual centre of view (for example as this is unnecessary as described herein because these regions are substantially correct already).

Image radial lines (e.g. 815) represent a line along which an object would move in the image if moved along a line in the scene parallel to the camera direction. An image radial line (e.g. 815) may be curved if the image comprises warping. Each image may be dewarped such that radial lines (e.g. 815) are straight on the composition plane or composition image.

In other embodiments, an artificial neural network (ANN) may be used in an interpolation algorithm. The training of an ANN may be described in relation to Fig. 1. Images may be taken from a variety of extrapolated point of views and back constructed from corresponding images taken from the camera array 120.

An interpolation algorithm described herein may interpolate regions of a composition image between virtual centres of view without requiring depth information of the scene. Other interpolation algorithms may be used.

Fig. 7a shows a camera array 920 comprising a camera plane 919. The camera array 920 comprises a plurality of cameras (e.g. 930,940,950,960,970,980). Each camera has a camera point of view (e.g. 921) located on the camera plane 919. Each camera (e.g. 930,940,950,960,970,980) has a camera field of view (e.g. 931,941,951,961,971,981). Each camera has a camera direction (e.g. 932,942,952,962,972,982).

The diagram of Fig. 7a is a top down (or side) view. The diagram of Fig. 7a is (or may be) a slice through a two or three-dimensional camera array (for example one that extends in two or three directions). The diagram of Fig. 7a may illustrate a single layer of a camera array. Other layers may be adjacent the layer shown.

The diagram of Fig. 7a illustrates how the camera array 920 may be comprised of cameras with different fields of view and/or be arranged in different camera directions and/or have different camera densities across the camera plane 919. For example camera field of view 941 is larger than camera field of view 971. Camera 930 may be a camera of type that provides a larger field of view (for example a wide-angle camera or fish-eye camera). Cameras 940, 950 and 960 are closer together (for example have greater camera density) compared to (for example the distances between) cameras 960, 970 and 980.

Camera 980 is located at the edge of the camera array. Camera 980 is angled away from the centre (for example the central or normal axis of) the camera array 920 or camera plane 919. Camera 980 (or cameras angled away from (or toward) the centre of the camera array and/or located proximal the edge of the camera array) may be useful for increasing the number of extrapolated point of views that lie within a back projected field of view (or for example if a user is further to the side of a telepresence display). Camera 960 is located approximately halfway between the centre and edge of the camera array 920. Camera 960 is angled away from the centre (for example the central or normal axis of) the camera array 920 or camera plane 919. Camera 940 is located closer to the centre compared to the edge of the camera array 920. Camera 940 angled towards the centre of (for example the central or normal axis of) the camera array 920 or camera plane 919.

Cameras 930 and/or 980 (for example located at the centre or edge of the camera array 920) may have higher definition (e.g. greater resolution or more advanced image capture technology or larger size output digital files for the same scene captured from the same point of view or higher file size per pixel in output digital images) at maximum camera settings compared to cameras 940 and/or 950 and/or 960 and/or 970.

In other embodiments, the arrangement shown in Fig. 7a may have a greater or lesser number of cameras. In other embodiments, the arrangement shown in Fig. 7a may have cameras in different locations (for example different locations on the camera plane or the locations of cameras may be swapped or moved with respect to the arrangement of Fig. 7a).

Fig. 7a is for illustrating a camera array with varying camera features. In other embodiments the arrangement of a camera array may be determined to optimise a particular type of scene or viewing type or application. Embodiments may have cameras angled (for example similar to cameras 940 or 960 in Fig. 7) for capturing views relevant for more sideways located (for example making greater angles with a composition plane or display) extrapolated points of view. In other embodiments all cameras in a camera array may be the same and/or point in the same direction (for example all normal to the camera plane). In other embodiments cameras with different parameters (for example different angle, field of view, resolution) may be distributed across the camera array according to a regular pattern. In other embodiments cameras of similar parameters may be located in clusters or in a regular pattern across the camera array.

Fig. 7b illustrates a camera array 1110. Camera array 1110 comprises three types of camera: a high definition camera 1120, a medium definition camera 1130 and a low definition camera 1140. The different types of camera shown (e.g. only) have definitions relative to each other (for example a high definition camera has a higher definition than a medium definition camera, which in turn has a higher definition than a low definition camera). High definition cameras 1120 may be first definition cameras as claimed herein. Medium definition cameras 1130 may be third definition cameras as claimed herein. Low definition cameras 1140 may be second definition cameras as claimed herein.

Camera array 1110 shows an embodiment with 62 cameras in total. Camera array 1110 has five high definition cameras 1120, fourteen medium definition cameras 1130 and 43 low definition cameras 1140. In other embodiments, different numbers of cameras may be present. The low definition cameras 1130 are most numerous because their main contribution to the composition image is to increase the number of viewpoints and/or reduce parallax error and/or reduce the interpolation required. They may output (for example after compression or transformation) images of low file size, for example black and white (or 1-bit or 2-bit colour) images or may be line images or vector images. The medium definition cameras 1130 are less numerous (for example have lower camera density) than the low definition cameras 1140. Interpolation from the medium definition cameras 1130 may enhance the images from the low-definition cameras 1140. For example an interpolated image from a medium definition camera may enhance the colour of a low definition image (e.g. after such image frame is positioned on a composition image) without altering the parallax of the low definition image. Five high definition cameras 1120 are shown. High definition cameras may further enhance the images taken from cameras located between or around them on the composition image, for example enhancing the colour or sharpness or lighting or definition of such images.

In other embodiments camera array 1110 may form part of a larger camera array, for example camera array 1110 may represent a quarter or sixth (or other fraction) of the size of the camera array. In other embodiments one or more or each camera of camera array 1110 may be replaced with a plurality of cameras of the same type. For example in such an embodiment, one or more of each camera (for example corresponding to high definition and/or medium definition and/or low definition cameras) of camera array 1110 may be replaced with two or 3 or 4 or 5 or 6 or greater than 6 cameras. For example such that the number of cameras is greater than 150 or 200 or 300 or greater.

The enhancements performed by higher definition cameras may be performed after the composition image is formed.

In various embodiments, the composition image may be formed from all image frames by positioning each image frame on the composition image as described herein and interpolating regions between centre of views. This step relates to determining correct parallax and/or edge definition for the composition image. These features may be most important for generating a believable telepresence effect. After this step the image is enhanced, for example using colour or high frequency contrast portions from the higher definition images without changing the parallax of the composition image. As such, the overall definition of the composition image is improved whilst utilising a large number of viewpoints for improved parallax whilst keeping processing or bandwidth required lower. Enhancement may be performed by an enhancement algorithm which may comprise artificial intelligence.

In an alternative embodiment, the different types of cameras shown in Fig. 7b may instead represent the transformed definition of image frames taken by a camera array. The camera array may have 62 cameras of similar definition or quality. However the images from the forty three "low definition cameras" may be compressed or transformed to a greater extent than the fourteen "medium definition cameras", which in turn are compressed or transformed to a greater extend than the five "high definition cameras". As such, instead of the cameras themselves being of different definition or quality it is instead the images they contribute to the composition image after processing. This processing may be done before (and/or after) the transmission or transfer or the captured content to user display devices.

Compression or transformation may comprise reducing the colour definition (for example to a lower bit rate or to 1-bit or 2-bit colour). Compression or transformation may comprise a known compression algorithm such as High Efficiency Video Coding (H.265) or Advanced Video Coding (H.264). Compression may comprise downsampling and/or discrete cosine transform and/or quantisation and/or other known compression effects or encoding. A compression algorithm used may be used to remove high frequency information on a frame whilst preserving low frequency information. Compression or transformation may comprise transforming an image into a line image. A line image may be monochromatic, or an image that consists of lines without any gradation of shade or colour, or have an appearance of a vectorised grayscale image. A line image may optionally be a vector image. A line image may be advantageous for the methods described herein as it can be used to provide correct parallax from a viewpoint whilst keeping file size and processing required low. The width or thickness of a line may be such that the line is elongate.

Fig. 8a and 8b show a possible arrangement for calibrating a camera. Calibrating a camera may comprise predetermining a relationship between camera angles and locations on images taken with the camera.

Fig. 8a shows a camera array 1020 comprising a camera plane 1019. A plurality of cameras may be arranged on the camera plane. One camera 1021 is shown. This may be for illustration purposes or for example as part of the process of calibrating a single camera. Camera 1021 has camera point of view 1025 and camera direction 1024.

A scene 1010 has a scene pattern 1011. The camera 1021 is arranged a camera array distance 1030 away from the scene 1010. The camera array distance 1030 may define the camera 1021 location with respect to the scene 1010 (for example a known point on the scene 1010).

A camera direction scene location 1012 represents the point on the scene 1010 or scene pattern 1011 that the camera direction 1024 intersects with. A camera ray 1028 (which relates to a light ray e.g. captured by the camera) extends from the camera point of view 1025 at a camera ray angle 1029 to the camera direction 1024. Scene location 1012 represents the point where the camera ray 1028 intersects with the scene 1010 or scene pattern 1011.

Fig. 8b shows an image frame 1060. A scene pattern 1061 appears on the image captured by a camera. A camera direction image location 1064 is located on the image at an image frame centre 1065. A scene location 1062 appears on the image.

The image frame 1060 represents an image captured by camera 1021. As such, scene location 1062 on the image frame 1060 corresponds with scene location 1012 (for example it represents the location on the image that captures the scene location 1012). Camera direction image location 1064 corresponds to camera direction scene location 1012. Scene pattern 1061 corresponds to scene pattern 1011.

The scene pattern 1011 may be any suitable pattern. For example scene pattern 1011 may be a pattern from which a location of the scene 1010 can be identified if only a portion of the scene pattern 1011 is visible. For example scene pattern 1011 may be a pattern that does not include repeating portions.

With the information provided by the features of Fig. 8a and 8b, including the geometry and/or relative locations of the features, different camera ray angles (for example 1029) from a camera point of view 1025 can be mapped to locations on an image frame 1060 taken by camera 1021. Once this mapping has been performed, a camera ray angle (for example 1029) can be related to a scene location 1062 regardless of the scene (for example the depth of the scene or the depth of the scene where it intersects with camera ray 1028).

The mapping of camera ray angles to image locations may be substantially correct. The mapping of camera ray angles to image locations may be (for example substantially) correct for example for a pin-hole camera model of the camera, or without taking into account lens effects (which may be negligible). Any errors caused by inaccuracies in the mapping of camera ray angles to image locations may be negligible for producing the telepresence effect as described herein.

The mapping or calibration described may be performed during installation or manufacture of the camera array. The mapping may provide pre-determined parameters to later calculations. It will be appreciated that there are other ways to provide such calibration or mapping between of camera ray angles to image locations.

A controller described herein (for example controllers 504, or 505) may comprise any suitable circuitry to cause performance of (for example potions of) the methods described herein. The controller may comprise: control circuitry; and/or processor circuitry; and/or at least one application specific integrated circuit (ASIC); and/or at least one field programmable gate array (FPGA); and/or single or multi-processor architectures; and/or sequential/parallel architectures; and/or at least one programmable logic controllers (PLCs); and/or at least one microprocessor; and/or at least one microcontroller; and/or a central processing unit (CPU); and/or a graphics processing unit (GPU), to perform the methods.

In various examples, the controller may comprise at least one processor and at least one memory. The memory stores a computer program comprising computer readable instructions that, when read by the processor, causes performance of the methods (or portions thereof) described herein. The computer program may be software or firmware, or may be a combination of software and firmware.

A processor may include at least one microprocessor and may comprise a single core processor, may comprise multiple processor cores (such as a dual core processor or a quad core processor), or may comprise a plurality of processors (at least one of which may comprise multiple processor cores).

Fig. 9a shows a user 1110 wearing a virtual or augmented reality headset 1129 (which may be a head mounted device or glasses or contact lenses) in a user real environment 1121 (for example a physical space in the real world). User 1110 views display screen 1192 through the headset 1129. User has eye location 1128 which is tracked by the system. The display screen 1192 has a location (e.g. relative to) in the user real environment 1121 and has display point of views 1141 located on it. A composition plane 1119 with virtual point of views 1125 are located coincident with the virtual display screen 1192. An extrapolated point of view is defined as the eye location 1128 in relation to the display point of views 1141. The virtual display screen may display a composition image using a method as described herein.

The headset 1129 tracks the eye location 1128. The headset 1129 (for example a controller) positions the virtual display screen 1192 in virtual space such that the screen does not move (for example in relation to movements of the user 1110 and/or in relation to the user real environment 1121 or user virtual environment). Virtual display screen 1192 may have fixed location (for example in relation to a real or virtual environment that the user can move in or from the perspective of the user). The user 1110 can move and thus move their point of view in relation to the display screen 1192.

Fig. 9b shows a similar arrangement to Fig. 9a but with a camera array 1220. Fig. 9b shows a user 1210 wearing a virtual or augmented reality headset 1229 (which may be a headset or glasses or contact lenses) in a user real environment 1221 (for example a physical space in the real world). In the Fig. 9b example the headset 1229 has a transparent portion for viewing the users eyes. User 1210 views display screen 1292 through the headset 1229. The display screen 1292 has a location (e.g. relative to) in the user real environment 1221 and has display point of views 1241 located on it.

A composition plane 1219 and virtual centre of views 1225 are located coincident with the virtual display screen 1292 (e.g. on the same plane). The virtual display screen may display a composition image using a method as described herein.

The camera array 1220 captures images of the user 1210 for transmission to a remote user (for example for two-way telepresence). Headset 1229 positions the display screen 1292 on the camera plane 1219. Such an arrangement may provide an improved two-way telepresence system, for example without interference (for example blocking or reduced telepresence effect) between the cameras and display screen. With the virtual display screen 1292 coincident with the camera plane 1219, eye gaze and eye focus is preserved across the telepresence communication.

In other embodiments a controller may receive a separation distance (or offset) between a remote telepresence device's display screen (e.g. and display nodes) and camera plane. The controller may position the local display screen the same separation (or offset) from the local camera plane.

Claims

We claim: 1. A method of providing an extrapolated view, the method comprising the steps of: receiving a plurality of image frames, wherein each image frame corresponds to a camera point of view of a camera of a camera array; determining the location of an extrapolated point of view a distance from the rear side of a camera plane formed by the camera points of view; identifying an extrapolated frame location on each image frame that corresponds with a light ray captured by a corresponding camera, wherein the light ray is aligned with a vector from the extrapolated point of view to the corresponding camera point of view; providing a composition image comprising a plurality of virtual centre of views arranged to correspond with the arrangement of the camera point of views; and wherein each virtual centre of view corresponds with a camera point of view; and positioning each image frame on the composition image such that its extrapolated frame location is coincident with its image frame's corresponding virtual centre of view.
2. The method according to claim 1, wherein: the plurality of image frames comprise one or more first image frames and a plurality of second image frames; and wherein the plurality of second image frames are lower definition than the plurality of first image frames; wherein a lower definition image has a smaller file size per pixel; and wherein the number of second image frames is greater than the number of first image frames; and wherein the step of positioning each image frame on the composition image further comprises enhancing portions of the composition image corresponding to second image frames with information from the first image frames, and wherein enhancing does not alter the locations of features on the composition image.
3. The method according to claim 2, wherein the second image frames are line images.
4. The method according to claim 2 or 3, wherein the second image frames are vector images.
5. The method according to any of claims 2 to 4, wherein: the second image frames have 1-bit colour and the first image frames have greater than 1-bit colour; and enhancing the composition image comprises enhancing the colour of the composition image.
6. The method according to any of claims 2 to 5, wherein: the plurality of image frames further comprises a plurality of third image frames; wherein the third image frames are lower definition than the plurality of first image frames and higher definition than the plurality of second image frames; and number of third image frames is greater than the number of first image frames and less than the number of second image frames. 7. 8. 9.
The method according to claim 6, wherein the number of second image frames is greater than 200 and the number of third image frames is greater than 100.
The method according to any previous claim, wherein the step of obtaining an extrapolated point of view comprises identifying an eye location of a user in relation to display nodes on a display; wherein each display node corresponds to a (e.g. remote) camera of the camera array; and wherein the method further comprises displaying the composition image on the display such that virtual centre of views of the composition are coincident with display nodes.
The method according to claim 8, wherein the display is provided and/or viewed using an augmented or virtual reality headset; and wherein the display comprises a virtual screen a distance from the user; wherein the display nodes are located on the virtual screen; and wherein the extrapolated point of view corresponds with an eye location relative to the position of the display nodes.
10. The method according to any previous claim, wherein the method further comprises the steps of: capturing images of the local scene in front of the display screen using a local camera array; transmitting the images to a remote telepresence display device for enabling two-way telepresence; receiving geometry from a remote telepresence device comprising a separation distance between a remote display screen and a remote camera plane; and actuating the local display screen (or locating the local virtual screen) to be a distance from the camera plane equal to the received separation distance.
11. The method according to any previous claim, wherein after the step of positioning each image frame, the method further comprises applying transparency to overlapping portions of image frames such that the combined effect of overlapping transparencies renders the composition image opaque across the area covered by the image frames.
12. The method according to any previous claim, wherein after the step of positioning each image frame, the method further comprises applying an interpolation algorithm to resolve portions of the composition image between the virtual centre of views.
13. The method according to claim 12, wherein applying an interpolation algorithm comprises identifying the same feature in a plurality of image frames; and wherein the method further comprises positioning the feature at a single interpolated location on the composition image.
14. The method according to claim 13, wherein the step of positioning comprises transforming (for example warping, stretching or shearing) a portion of such image frames distal to their corresponding extrapolated frame location along radial lines extending away from the extrapolated frame location.
15. The method according to any of claims 12 to 14, wherein the interpolation algorithm comprises an artificial neural network.
16. A camera array suitable for capturing images for the method according to claim 1, wherein the camera array is planar with a height less than 3 metres and a width less than 3 metres, and comprises one or more first definition cameras and a plurality of second definition cameras; wherein the first definition cameras have higher definition than the second definition cameras; wherein a higher definition camera outputs an image with larger file size per pixel when at maximum camera settings; and wherein the number of cameras is greater than 200; and wherein the number of second definition cameras is greater than the number of first definition cameras.
17. The camera array according to claim 16, wherein the first definition cameras output a file size per pixel when at maximum camera settings greater than 5 times the second definition cameras.
18. The camera array according to claim 17, wherein the camera array further comprises a controller; and wherein the controller is configured to convert the images from the second definition cameras into monochrome line images or vector images.
19. The camera array according to claim 17 or 18, wherein the camera array further comprises a plurality of third definition cameras; wherein the third definition cameras have higher definition than the second definition cameras and lower definition that the first definition cameras; and wherein the number of second definition cameras is greater than the number of third definition cameras.
20. A telepresence system comprising: a display screen for displaying a composition image to a user, wherein the display screen comprises a plurality of display nodes, and wherein each display node corresponds to a camera of a camera array and the display nodes are spaced according to the geometry of the camera array; an eye tracking device for identifying the three-dimensional location of an eye with respect to the display nodes; a controller, wherein the controller is configured to: receive an eye location from the eye tracking device; receive a plurality of image frames, wherein each image frames corresponds with a camera point of view of a camera of a camera array; determine the location of an extrapolated point of view a distance from the rear side of a camera plane formed by the camera points of view that corresponds with the eye location; identify an extrapolated frame location on each image frame that corresponds with a light ray captured by a corresponding camera at its camera point of view, wherein the light ray is aligned with a vector from the extrapolated point of view to the corresponding camera point of view; provide a composition image comprising a plurality of virtual centre of views arranged to correspond with the arrangement of the camera point of views; and wherein each virtual centre of view corresponds with a camera point of view; and position each image frame on the composition image such that its extrapolated frame location is coincident with its image frame's corresponding virtual centre of view; interpolate portions of the composition image between the virtual centre of views; and display the composition image on the display screen such that the virtual centre of views are co-located with corresponding display nodes.
21. The telepresence system according to claim 20, further comprising a local camera array; wherein the display screen is located proximal the local camera array, and wherein each camera of the local camera array is positioned with respect to the display screen such that it can capture images through the display screen.
22. The telepresence system according to claim 20 or 21, wherein the telepresence system comprises an augmented or virtual reality headset; and wherein the display screen is a virtual screen positioned in the environment of the user when viewed through the headset.
23. The telepresence system according to claim 21, wherein the display screen comprises holes, and wherein each local camera comprises a capture portion positioned within a hole.
24. The telepresence system according to claim 21, wherein the display screen is semitransparent.
25. The telepresence system according to any of claims 21, wherein the display screen is moveable by actuators to increase or decrease a separation distance from the camera array.