AU2019201822A1

AU2019201822A1 - BRDF scanning using an imaging capture system

Info

Publication number: AU2019201822A1
Application number: AU2019201822A
Authority: AU
Inventors: Matthew Raphael Arnison; Peter Alleine Fletcher; Thai Quan Huynh-Thu; David Karlov; Timothy Stephen Mason; David Peter Morgan-Mar
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2019-03-15
Filing date: 2019-03-15
Publication date: 2020-10-01

Abstract

BRDF SCANNING USING AN IMAGING CAPTURE SYSTEM ABSTRACT A method of determining the bidirectional reflectance distribution function (BRDF) of an object (110) using a marker (500) placed on the object (110), the object (110) having an unknown shape. The method comprises: capturing (410), by an image capturing device, a plurality of images of the object (110), the object (110) being illuminated by a light source at a fixed location, the image capturing device capturing the plurality of images from different positions in an un-calibrated manner; aligning (420) corresponding regions of the plurality of images using the marker; determining (430) a camera pose for each of the plurality of images based on the alignment of the corresponding regions of the plurality of images; determining (440) a shape of the object using a 3D alignment method based on the alignment of the corresponding regions of the plurality of images; and estimating (470) the BRDF of the object (110) from the aligned regions and the determined shape of the object (110). P0005597AU_22263074_1 1/19 120 130 100 110 140 Fig. 1 22263124_1

Description

1/19 120

130 100

110

140

Fig. 1

22263124_1

BRDF SCANNING USING AN IMAGING CAPTURE SYSTEM TECHNICAL FIELD

This invention relates to a method and apparatus of image processing to capture the surface appearance of a material.

BACKGROUND

Capturing a detailed 3-dimensional (3D) model of an object has applications in many fields including industrial design, computer graphics, cultural heritage preservation, marketing and commerce, and other areas. A 3D model of an object can be thought of as comprising the shape information, which is purely geometrical, plus the surface appearance information, which governs how light interacts with the surface.

Characterising the surface appearance over the surface of a 3D object requires capturing multiple colour samples at many different points across the surface of the object. In many cases, the data collection can be combined with capture of the shape of the 3D object, as camera images may be used to estimate both the 3D geometry and either the diffuse colour or the bidirectional reflectance distribution function (BRDF).

One method of estimating 3D object shape and BRDF is to assemble a rigid framework of cameras and lights, such that the cameras and lights are fixed in known relative positions and angles. The object to be scanned can be placed inside the framework and multiple images of the object can be captured by the cameras, under different, known, lighting conditions. The camera images can then be collated and used to assemble a 3D model of the object's shape. Also, because the cameras are fixed and their positions known, the colour samples from the cameras can be collated with illumination and viewing angles to provide samples of the BRDF at points on the object surface, then these samples can be used to estimate the BRDF. This method has the disadvantage that it requires a large framework, much bigger than the object being scanned, as well as multiple cameras and lights, producing problems of expense and portability.

Another method of capturing the 3D shape of an object is to capture multiple images of the object from different viewing angles, and then to apply a shape estimation algorithm such as structure from motion. This method can make use of hand-held photos from an off-the-shelf camera. However, state of the art structure from motion algorithms have residual errors in the camera pose estimation which result in multi-pixel alignment errors between the various camera

P0005597AU_22263074_1 views. A single view may be selected to provide an estimate of the diffuse colour across the object, but the misalignment prevents multiple views from being collated into samples suitable for BRDF estimation. Attempting to estimate BRDF from such a system results in large errors.

Another method similar to the structure from motion or photogrammetry method uses a single imaging device to capture a large number of images of the object from different viewing angles in a given lighting environment. Feature points are identified and matched across the images, camera poses are estimated and multi-stereo is used to estimate a 3D mesh representing the object shape. However, the estimated camera poses can be inaccurate, especially for glossy objects whose appearance changes rapidly with view angle. The camera pose inaccuracy means that it is difficult to accurately compare RGB values for the same point across multiple views. Further, the captured images are typically used to estimate only a single RGB colour value for each point on the 3D mesh. The estimation of a single RGB colour value renders the method unsuitable to capture the surface appearance of an object or the BRDF of the surface. The estimation of a single RGB colour value for each point on the object means that the lighting of the environment cannot be disambiguated from the real colour of the surface, i.e. diffuse information and specular information cannot be distinguished or separated. For example, this method cannot distinguish a matte surface from a glossy surface. This creates unrealistic digital rendering of the object in applications where photorealistic rendering is needed in various lighting conditions, such as visual effects or e-commerce.

A variation on the structure from motion method estimates the spatially varying BRDF using a sequence of images captured from a video camera and a laser scanner. The information from multiple views is used to reconstruct the BRDF. However, the estimated camera poses may be inaccurate which may reduce the accuracy of the BRDF. In addition, the laser scan requires additional time, expertise and expensive equipment. Laser scans may also have reduced depth accuracy for glossy objects due to weak retro-reflection.

One method of capturing the BRDF of an unknown surface of an object uses a single camera with a flash attached to it. The camera then captures an image of the unknown surface with a camera viewing angle such that the camera sensor is parallel to the unknown surface. This method uses a deep learning network obtained from a training set of surfaces with known BRDF to produce an estimation of four appearance maps: a diffuse albedo map, a specula albedo map, a normal map and a roughness map. This method, however, is unable to capture the BRDF of surfaces that are highly reflective because the camera's sensor response can be

P0005597AU_22263074_1 saturated. Saturation of the camera sensor then results in the camera sensor being unable to capture reflective information for sharp and saturated specular highlights from the surface. This deep learning method is also hindered by the necessity to use low-resolution images in the training data to allow the deep network to converge to a model of the BRDF. Furthermore, such method does not estimate accurately the BRDF of a surface with mixed properties, such as matte areas on a glossy surface.

The images captured to estimate the shape and colour of an object need to be aligned. One method of improving the alignment performance of a 3D scanning system is to employ marker objects placed in the scene or on the object being scanned, to provide detectable reference points in three dimensions. The detectable reference points may then be aligned between captured images to reconstruct the camera poses and a 3D model of the object being scanned. A disadvantage of some marker systems is that the markers, while easily detectable, provide positional accuracy no better than several image pixels, resulting in misalignments too large to allow accurate shape and BRDF estimation. Other marker systems provide planar markers, which are unsuitable for scanning non-planar surfaces because of the inability of the markers to conform to the shape of the surface. Some marker systems provide feature points which are detectable in some images, but may be mistaken for different feature points or not detected at all in other images due to lighting and reflectance differences, thus leading to misregistration of the images.

Existing methods suffer from one or several of the following limitations: the system can only scan the BRDF of a flat surface, the scanning system is too constrained, complex and cannot be deployed quickly, the scanning system requires in-situ calibration before use, the resulting scan provides only a frontal view of the surface, and the lighting information from the ambient lighting cannot be disambiguated from the reflective properties of the surface of the object.

There is need for a scanning method and system that can capture both the 3D shape of an object and multiple well-aligned samples of densely distributed points on the surface to enable accurate BRDF estimation at all of those points, using relatively inexpensive, unconstrained and lightweight equipment, such as an off-the-shelf consumer camera.

P0005597AU_22263074_1

SUMMARY

It is an object of aspects of the present disclosure to substantially overcome or at least ameliorate one or more of the above disadvantages.

According to one aspect of the present disclosure, there is provided a method of determining the bidirectional reflectance distribution function (BRDF) of an object using a marker placed on the object, the object having an unknown shape, the method comprising: capturing, by an image capturing device, a plurality of images of the object, the object being illuminated by a light source at a fixed location, the image capturing device capturing the plurality of images from different positions in an un-calibrated manner; aligning corresponding regions of the plurality of images using the marker; determining a camera pose for each of the plurality of images based on the alignment of the corresponding regions of the plurality of images; determining a shape of the object using a 3D alignment method based on the alignment of the corresponding regions of the plurality of images; and estimating the BRDF of the object from the aligned regions and the determined shape of the object.

According to another aspect of the present disclosure, there is provided a device comprising: a processor; a memory in communication with the processor, wherein the memory comprises a computer application program that is executable by the processor, the computer application program comprising a method of determining the bidirectional reflectance distribution function (BRDF) of an object using a marker placed on the object, the object having an unknown shape, the method comprising: capturing, by an image capturing device, a plurality of images of the object, the object being illuminated by a light source at a fixed location, the image capturing device capturing the plurality of images from different positions in an un-calibrated manner; aligning corresponding regions of the plurality of images using the marker; determining a camera pose for each of the plurality of images based on the alignment of the corresponding regions of the plurality of images; determining a shape of the object using a 3D alignment method based on the alignment of the corresponding regions of the plurality of images; and estimating the BRDF of the object from the aligned regions and the determined shape of the object.

P0005597AU_22263074_1

BRIEF DESCRIPTION OF THE DRAWINGS

One or more embodiments of the invention will now be described with reference to the following drawings, in which:

Fig. 1 illustrates an imaging system for capturing images of a 3-dimensional object;

Fig. 2 is a flow diagram illustrating a method of estimating an object shape and BRDF from images captured by the system of Fig. 1;

Fig. 3 is a flow diagram illustrating a method for capturing and rendering the BRDF of the surface of the object from images captured by the system of Fig. 1;

Fig. 4 is a schematic flow diagram to of a sub-process of the method of Fig. 3 for capturing the BRDF of the surface of the object;

Fig. 5 is a diagram of a marker configured to aid the alignment of images captured from different camera poses;

Fig. is a diagram of a high-bandwidth alignment pattern of the marker of Fig. 5;

Fig. is a flow diagram illustrating a method of capturing images of an object for the purpose of BRDF estimation;

Fig. 8 is a diagram of a marker placed on an object;

Fig. 9 is a flow diagram illustrating a method of aligning the marker of Fig. 5, as performed in the sub-process of Fig. 4;

Fig. 10 is a flow diagram illustrating a method of identifying a feature point in an aligned marker pattern as performed in the method of Fig. 9;

Fig. 11 is a flow diagram illustrating a method of surface fitting of a point cloud as performed in the sub-process of Fig. 4;

Fig. 12 is a flow diagram illustrating a method of determining the scale of a point cloud with respect to the marker of Fig. 5, as performed in the sub-process of Fig. 4;

P0005597AU_22263074_1

Fig. 13 is a flow diagram illustrating a surface parametrisation method as performed in the sub process of Fig. 4;

Fig. 14 is a flow diagram illustrating texture projection to produce texture maps as performed in the sub-process of Fig. 4;

Fig. 15 is a flow diagram illustrating a method of aligning texture maps as performed in the sub-process of Fig. 4;

Fig. 16 is a flow diagram illustrating a method of point cloud refinement;

Fig. 17 is a flow diagram illustrating a method of BRDF estimation; and

Figs. 18A and 18B form a schematic block diagram of a general purpose computer system upon which arrangements described can be practiced.

DETAILED DESCRIPTION

Where reference is made in any one or more of the accompanying drawings to steps and/or features, which have the same reference numerals, those steps and/or features have for the purposes of this description the same function(s) or operation(s), unless the contrary intention appears.

Computer Description

Figs. 18A and 18B depict a general-purpose computer system 1800, upon which the various arrangements described can be practiced.

As seen in Fig. 18A, the computer system 1800 includes: a computer module 1801; input devices such as a keyboard 1802, a mouse pointer device 1803, a scanner 1826, a camera 1827, and a microphone 1880; and output devices including a printer 1815, a display device 1814 and loudspeakers 1817. An external Modulator-Demodulator (Modem) transceiver device 1816 may be used by the computer module 1801 for communicating to and from a communications network 1820 via a connection 1821. The communications network 1820 may be a wide-area network (WAN), such as the Internet, a cellular telecommunications network, or a private WAN. Where the connection 1821 is a telephone line, the modem 1816 may be a traditional "dial-up" modem. Alternatively, where the connection 1821 is a high capacity (e.g., cable)

P0005597AU_22263074_1 connection, the modem 1816 may be a broadband modem. A wireless modem may also be used for wireless connection to the communications network 1820.

The computer module 1801 typically includes at least one processor unit 1805, and a memory unit 1806. For example, the memory unit 1806 may have semiconductor random access memory (RAM) and semiconductor read only memory (ROM). The computer module 1801 also includes a number of input/output (1/0) interfaces including: an audio-video interface 1807 that couples to the video display 1814, loudspeakers 1817 and microphone 1880; an 1/0 interface 1813 that couples to the keyboard 1802, mouse 1803, scanner 1826, camera 1827 and optionally ajoystick or other human interface device (not illustrated); and an interface 1808 for the external modem 1816 and printer 1815. In some implementations, the modem 1816 may be incorporated within the computer module 1801, for example within the interface 1808. The computer module 1801 also has a local network interface 1811, which permits coupling of the computer system 1800 via a connection 1823 to a local-area communications network 1822, known as a Local Area Network (LAN). As illustrated in Fig. 18A, the local communications network 1822 may also couple to the wide network 1820 via a connection 1824, which would typically include a so-called "firewall" device or device of similar functionality. The local network interface 1811 may comprise an Ethernet circuit card, a Bluetooth© wireless arrangement or an IEEE 802.11 wireless arrangement; however, numerous other types of interfaces may be practiced for the interface 1811.

The I/O interfaces 1808 and 1813 may afford either or both of serial and parallel connectivity, the former typically being implemented according to the Universal Serial Bus (USB) standards and having corresponding USB connectors (not illustrated). Storage devices 1809 are provided and typically include a hard disk drive (HDD) 1810. Other storage devices such as a floppy disk drive and a magnetic tape drive (not illustrated) may also be used. An optical disk drive 1812 is typically provided to act as a non-volatile source of data. Portable memory devices, such optical disks (e.g., CD-ROM, DVD, Blu-ray DiscTM), USB-RAM, portable, external hard drives, and floppy disks, for example, may be used as appropriate sources of data to the system 1800.

The components 1805 to 1813 of the computer module 1801 typically communicate via an interconnected bus 1804 and in a manner that results in a conventional mode of operation of the computer system 1800 known to those in the relevant art. For example, the processor 1805 is coupled to the system bus 1804 using a connection 1818. Likewise, the memory 1806 and

P0005597AU_22263074_1 optical disk drive 1812 are coupled to the system bus 1804 by connections 1819. Examples of computers on which the described arrangements can be practised include IBM-PC's and compatibles, Sun Sparcstations, Apple MacTM or like computer systems.

The method of estimating an object and the BRDF of the object may be implemented using the computer system 1800 wherein the processes of Figs. 2-4, 7, and 9-17, to be described, may be implemented as one or more software application programs 1833 executable within the computer system 1800. In particular, the steps of the method of estimating an object and the BRDF of the object are effected by instructions 1831 (see Fig. 18B) in the software 1833 that are carried out within the computer system 1800. The software instructions 1831 may be formed as one or more code modules, each for performing one or more particular tasks. The software may also be divided into two separate parts, in which a first part and the corresponding code modules performs the object estimation and BRDF estimation of the object methods and a second part and the corresponding code modules manage a user interface between the first part and the user.

The software may be stored in a computer readable medium, including the storage devices described below, for example. The software is loaded into the computer system 1800 from the computer readable medium, and then executed by the computer system 1800. A computer readable medium having such software or computer program recorded on the computer readable medium is a computer program product. The use of the computer program product in the computer system 1800 preferably effects an advantageous apparatus for estimating an object and BRDF of the object.

The software 1833 is typically stored in the HDD 1810 or the memory 1806. The software is loaded into the computer system 1800 from a computer readable medium, and executed by the computer system 1800. Thus, for example, the software 1833 may be stored on an optically readable disk storage medium (e.g., CD-ROM) 1825 that is read by the optical disk drive 1812. A computer readable medium having such software or computer program recorded on it is a computer program product. The use of the computer program product in the computer system 1800 preferably effects an apparatus for estimating an object and BRDF of the object.

In some instances, the application programs 1833 may be supplied to the user encoded on one or more CD-ROMs 1825 and read via the corresponding drive 1812, or alternatively may be read by the user from the networks 1820 or 1822. Still further, the software can also be loaded

P0005597AU_22263074_1 into the computer system 1800 from other computer readable media. Computer readable storage media refers to any non-transitory tangible storage medium that provides recorded instructions and/or data to the computer system 1800 for execution and/or processing. Examples of such storage media include floppy disks, magnetic tape, CD-ROM, DVD, Blu rayM Disc, a hard disk drive, a ROM or integrated circuit, USB memory, a magneto-optical disk, or a computer readable card such as a PCMCIA card and the like, whether or not such devices are internal or external of the computer module 1801. Examples of transitory or non tangible computer readable transmission media that may also participate in the provision of software, application programs, instructions and/or data to the computer module 1801 include radio or infra-red transmission channels as well as a network connection to another computer or networked device, and the Internet or Intranets including e-mail transmissions and information recorded on Websites and the like.

The second part of the application programs 1833 and the corresponding code modules mentioned above may be executed to implement one or more graphical user interfaces (GUIs) to be rendered or otherwise represented upon the display 1814. Through manipulation of typically the keyboard 1802 and the mouse 1803, a user of the computer system 1800 and the application may manipulate the interface in a functionally adaptable manner to provide controlling commands and/or input to the applications associated with the GUI(s). Other forms of functionally adaptable user interfaces may also be implemented, such as an audio interface utilizing speech prompts output via the loudspeakers 1817 and user voice commands input via the microphone 1880.

Fig. 18B is a detailed schematic block diagram of the processor 1805 and a "memory" 1834. The memory 1834 represents a logical aggregation of all the memory modules (including the HDD 1809 and semiconductor memory 1806) that can be accessed by the computer module 1801 in Fig. 18A.

When the computer module 1801 is initially powered up, a power-on self-test (POST) program 1850 executes. The POST program 1850 is typically stored in a ROM 1849 of the semiconductor memory 1806 of Fig. 18A. A hardware device such as the ROM 1849 storing software is sometimes referred to as firmware. The POST program 1850 examines hardware within the computer module 1801 to ensure proper functioning and typically checks the processor 1805, the memory 1834 (1809, 1806), and a basic input-output systems software (BIOS) module 1851, also typically stored in the ROM 1849, for correct operation. Once the

P0005597AU_22263074_1

POST program 1850 has run successfully, the BIOS 1851 activates the hard disk drive 1810 of Fig. 18A. Activation of the hard disk drive 1810 causes a bootstrap loader program 1852 that is resident on the hard disk drive 1810 to execute via the processor 1805. This loads an operating system 1853 into the RAM memory 1806, upon which the operating system 1853 commences operation. The operating system 1853 is a system level application, executable by the processor 1805, to fulfil various high level functions, including processor management, memory management, device management, storage management, software application interface, and generic user interface.

The operating system 1853 manages the memory 1834 (1809, 1806) to ensure that each process or application running on the computer module 1801 has sufficient memory in which to execute without colliding with memory allocated to another process. Furthermore, the different types of memory available in the system 1800 of Fig. 18A must be used properly so that each process can run effectively. Accordingly, the aggregated memory 1834 is not intended to illustrate how particular segments of memory are allocated (unless otherwise stated), but rather to provide a general view of the memory accessible by the computer system 1800 and how such is used.

As shown in Fig. 18B, the processor 1805 includes a number of functional modules including a control unit 1839, an arithmetic logic unit (ALU) 1840, and a local or internal memory 1848, sometimes called a cache memory. The cache memory 1848 typically includes a number of storage registers 1844 - 1846 in a register section. One or more internal busses 1841 functionally interconnect these functional modules. The processor 1805 typically also has one or more interfaces 1842 for communicating with external devices via the system bus 1804, using a connection 1818. The memory 1834 is coupled to the bus 1804 using a connection 1819.

The application program 1833 includes a sequence of instructions 1831 that may include conditional branch and loop instructions. The program 1833 may also include data 1832 which is used in execution of the program 1833. The instructions 1831 and the data 1832 are stored in memory locations 1828, 1829, 1830 and 1835, 1836, 1837, respectively. Depending upon the relative size of the instructions 1831 and the memory locations 1828-1830, a particular instruction may be stored in a single memory location as depicted by the instruction shown in the memory location 1830. Alternately, an instruction may be segmented into a number of parts each of which is stored in a separate memory location, as depicted by the instruction segments shown in the memory locations 1828 and 1829.

P0005597AU_22263074_1

In general, the processor 1805 is given a set of instructions which are executed therein. The processor 1805 waits for a subsequent input, to which the processor 1805 reacts to by executing another set of instructions. Each input may be provided from one or more of a number of sources, including data generated by one or more of the input devices 1802, 1803, data received from an external source across one of the networks 1820, 1802, data retrieved from one of the storage devices 1806, 1809 or data retrieved from a storage medium 1825 inserted into the corresponding reader 1812, all depicted in Fig. 18A. The execution of a set of the instructions may in some cases result in output of data. Execution may also involve storing data or variables to the memory 1834.

The disclosed object and object's BRDF estimation arrangements use input variables 1854, which are stored in the memory 1834 in corresponding memory locations 1855, 1856, 1857. The object and object's BRDF estimation arrangements produce output variables 1861, which are stored in the memory 1834 in corresponding memory locations 1862, 1863, 1864. Intermediate variables 1858 may be stored in memory locations 1859, 1860, 1866 and 1867.

Referring to the processor 1805 of Fig. 18B, the registers 1844, 1845, 1846, the arithmetic logic unit (ALU) 1840, and the control unit 1839 work together to perform sequences of micro operations needed to perform "fetch, decode, and execute" cycles for every instruction in the instruction set making up the program 1833. Each fetch, decode, and execute cycle comprises:

a fetch operation, which fetches or reads an instruction 1831 from a memory location 1828, 1829, 1830;

a decode operation in which the control unit 1839 determines which instruction has been fetched; and

an execute operation in which the control unit 1839 and/or the ALU 1840 execute the instruction.

Thereafter, a further fetch, decode, and execute cycle for the next instruction may be executed. Similarly, a store cycle may be performed by which the control unit 1839 stores or writes a value to a memory location 1832.

Each step or sub-process in the processes of Figs. 2-4, 7, and 9-17 is associated with one or more segments of the program 1833 and is performed by the register section 1844, 1845, 1847,

P0005597AU_22263074_1 the ALU 1840, and the control unit 1839 in the processor 1805 working together to perform the fetch, decode, and execute cycles for every instruction in the instruction set for the noted segments of the program 1833.

The method of object and object's BRDF estimation may alternatively be implemented in dedicated hardware such as one or more integrated circuits performing the functions or sub functions of Figs. 2-4, 7, and 9-17. Such dedicated hardware may include graphic processors, digital signal processors, or one or more microprocessors and associated memories.

Imaging system and method for estimating a three-dimensional object

Fig. 1 illustrates an imaging system 100 for capturing 3-dimensional geometry information and colour sample information from a 3-dimensional object 110. An image of the object 110 is captured from each of N different camera positions 120, 130, 140, and possibly further different camera positions (not shown). The images may be captured by one or more image capturing device (such as a camera or a video camera). The images may be captured by multiple cameras, one camera occupying each camera position 120, 130, 140, or by a single camera moved serially to each camera position 120, 130, 140, or by an intermediate number of cameras moved serially such that at least one image is captured from each camera position 120, 130, 140. Alternatively, the images may be captured by a video camera moving between the different camera positions 120, 130, 140.

The camera positions 120, 130, 140 are not fixed and are not calibrated. Therefore, the image capturing device captures images of the object 110 from different camera positions in an un calibrated manner.

The set of N images of the object 110 captured from the N camera positions 120, 130, 140 (and possibly further different camera positions, not shown) may be used to estimate the 3 dimensional geometry of the object 110 and the surface material appearance at points on the surface of the object 110.

Fig. 2 is a flow diagram illustrating an object estimation method 200 of estimating the 3 dimensional shape of the object 110 and surface material appearance data of the object 110, given the set of images of the object 110 captured from the camera positions 120, 130, 140 (and possibly further different camera positions).

P0005597AU_22263074_1

The method 200 can be implemented as a software application program 1833 executable by the processor 1805 of the computer system 1800.

The method 200 begins with a data receiving step 210, in which the set of images captured from the camera positions 120, 130, 140 (and possibly further different camera positions) are received by a processor 1805 (as shown in Fig. 18A).

The method 200 then proceeds from step 210 to step 220.

In a structure from motion step 220, the processor 1805 applies a structure from motion algorithm to determine a point cloud representation of the 3-dimensional structure of the object 110, and an estimate of the 3-dimensional pose (position and orientation) of the camera for each camera position 120, 130, 140 (and possibly further different camera positions). Structure from motion is a class of techniques used to reconstruct 3D geometry from 2D images of objects (e.g., 110) captured from different viewing positions, using photogrammetric alignment of correspondence points, similar to 3D reconstruction from stereographic imaging. The structure from motion algorithm may be an implementation as available in commercial software packages (e.g., Agisoft PhotoScan) or open source libraries (e.g., OpenCV). Structure from motion is also known as photogrammetry.

In a meshing step 230, the point cloud representation is converted to a mesh representation of the surface shape of the object 110. An example of the meshing algorithm is normal reconstruction followed by Poisson surface reconstruction. Another example of a meshing algorithm is ball pivoting surface reconstruction. The method 200 then proceeds from step 230 to step 240.

In a projection step 240, each image from the set of images captured from the camera positions 120, 130, 140 (and possibly further different camera positions) is projected on to the mesh representation. Projection from a captured image to the mesh may be performed by raycasting each image pixel using perspective projection with the camera poses estimated in step 220 to determine which visible mesh face (such as a triangle or quadrilateral) it intersects, then performing a weighted colour averaging of all colour samples assigned to each mesh face, to assign a colour per mesh face. Alternatively, the raycasting may additionally determine the position within the mesh face, map the position on the mesh face to a position in a texture image associated with the mesh, and determine a colour value of the position in the texture image. The method 200 proceeds from step 240 to step 250.

P0005597AU_22263074_1

In a BRDF estimation step 250, the BRDF of each triangle of the mesh is estimated from the projected colour information on the triangle from each of the camera positions 120, 130, 140 (and possibly further different camera positions), plus the camera poses as estimated in structure from motion step 220. The method 200 terminates at the conclusion of step 250.

However, due to the difficulty of solving the structure from motion problem, typically the colour information from each camera view is substantially misaligned with respect to the other camera views. This misalignment means that using colour samples from different camera views to estimate the BRDF of points on the object 110 will produce inaccurate diffuse colour estimates and inaccurate BRDF parameters.

BRDF capture and estimation

Fig. 3 is a flowchart illustrating the steps of a method 300 for obtaining a rendered image of a BRDF of an unknown object (110). The method 300 can be implemented as a software application program 1833 executable by the processor 1805 of the computer system 1800. The method 300 starts at sub-process 310 by capturing the BRDF of the object 110. The output of step 310 is a set of appearance maps and a mesh representing the object surface shape. The appearance maps represent the spatially varying reflective properties of the surface of the object 110, such as a diffuse colour map, a specular amplitude map, a specular roughness map and a normal map. The object surface shape can be represented using a 3D mesh comprising polygons, together with a parametrisation which provides a mapping from the mesh polygon vertices to 2D co-ordinates in the appearance maps.

The sub-process 310 of capturing and estimating BRDF is detailed in Fig. 4.

The method 300 then proceeds from sub-process 310 to step 320.

In a rendering step 320, the processor 1805 renders the virtual object (corresponding to the object 110). The rendered virtual object has surface appearance that is rendered based on the the BRDF captured by sub-process 310.

In rendering step 320, the appearance maps and the object shape data can be used by an image renderer to compute the red, green and blue (R,G,B) value at each rendered image pixel location in a given simulated lighting environment according to a given virtual camera position and view angle with respect to the virtual object (which corresponds to the object 110). Each

P0005597AU_22263074_1 pixel is calculated by casting a ray from the virtual camera sensor through the virtual camera lens and finding the first intersection point with the object shape. The intersection point is located on a point within a polygon in the 3D mesh. The parametrisation is used to look up the corresponding appearance data at the intersection point from the appearance maps. The planar orientation of the intersection polygon is used together with 3D normal angles from the normal map to calculate a 3D normal angle at the intersection point. A 3D view angle from the intersection point to the virtual camera pixel is calculated. A 3D lighting angle from the intersection point to a light in the simulated lighting environment is calculated.

The reflected intensity for each RGB colour channel is calculated using a BRDF model for reflected diffuse and specular intensity where the model uses the 3D normal angle, the 3D view angle, the 3D lighting angle and the appearance data corresponding to the intersection point, including the (R, G, B) diffuse colour, the specular amplitude and the specular roughness. The reflected intensity is then added to the RGB intensity value for the virtual camera pixel being raycasted. The raycasting process is repeated for each light source and each virtual camera pixel.

The light sources may be point lights, area lights or an environment map. Area lights can be represented as a set of point light sources covering the area of the light. Environment maps can be represented as light sources covering a sphere at infinite distance from the virtual object. Using raycasting, a realistic rendered image of the virtual object can be calculated for any virtual camera position, including spatially varying texture, gloss and mirror-like reflection effects, as long as the appearance maps and object shape data are realistic.

The method 300 terminates at the conclusion of step 320.

The discussion now turns to sub-process 310 and Fig. 4.

The sub-process 310 can be implemented as a software application program 1833 executable by the processor 1805 of the computer system 1800.

The sub-process 310 begins with a data receiving step 210, in which the set of images captured from the camera positions 120, 130, 140 (and possibly further different camera positions) are received by a processor 1805 (as shown in Fig. 18A).

P0005597AU_22263074_1

The sub-process 310 starts with step 410 by capturing a set of images. The set of images are captured by a camera of the system 100 from the camera positions 120, 130, 140 (and possibly further different camera positions) and are received by a processor 1805 (as shown in Fig. 18A).

The images captured at step 410 are images of the surface of the object 110 when a reference marker has been placed around the object 110. Method 700 as described in Fig. 7 is a method that can be implemented in step 410.

An example of the reference marker is shown in Fig. 5. The reference marker 500 includes a high bandwidth colour alignment pattern 600 (see Fig. 6) in a rectangular band. The alignment pattern 600 is located in a pattern region 560 of the reference marker 500. The reference marker 500 also includes a white rectangular band 595 outside the pattern region 560. Further, the reference marker 500 includes fiducial markers 580 to assist in approximating camera pose. The reference marker 500 is initially generated by a computer. The initially generated reference marker 500 will be referred to hereinafter as the ideal image marker 500.

The ideal image marker 500 is then printed using a high resolution printer, such as an inkjet printer, on a medium to create a printed marker 500. The region 570 inside the rectangular marker band (i.e., the pattern region 560) is cut out using a scalpel, creating a cut-out region (i.e., a void) 570 through which the object 110 can be seen. The marker 500 is printed on a deformable medium, such as high weight matte paper. The ideal image marker 500 and the printed marker 500 differ due to the scaling that occurs during printing.The printed marker 500 is placed on the object 110 being scanned, creating a placed printed marker 500, such that the rectangular marker band 560 of the printed marker 500 surrounds the surface of the object 110 to be captured. Because the marker 500 is printed on deformable media, the marker 500 conforms to the approximate shape of the surface of the object 110. The marker 500 provides detectable reference points in three dimensions. The detectable reference points can then be aligned between captured images to reconstruct the camera poses and a 3D model of the object 110 being scanned.

The reference marker 500 is further discussed below in relation to Fig. 5. The marker 500 can exist in different forms, such as an image on a computer display, a printed version on a medium, and the like. For simplicity sake, the reference numeral 500 is used for the different forms of the marker 500.

P0005597AU_22263074_1

In one arrangement, a point light source such as a flash is positioned in a fixed location and is pointed towards the object 110 to illuminate the object 100 for each image capture. For example, the light source is fixed at a distance of 2.3 m and an elevation angle of 72 degrees with respect to the centre of the surface of the object110. The light source allows capturing specular information about the surface of the object 110. A set of images of the printed marker 500 and the surface of the object 110 are captured using the flash during which the camera position and view direction is changed (e.g., camera position 120, 130, 140) between each image capture, but the positions of the printed marker, the surface of the object 110, and the flash are all kept constant. In other words, in this arrangement, the camera positions 120, 130, 140 are varied when capturing images of the object 110, while the other components (e.g., the object 110, the printed placed marker) are stationary.

The output of step 410 is a set of multiple images of the object 110 with the marker 500 captured from multiple viewing angles.

Sub-process 310 proceeds from step 410 to step 420 to align the set of images captured at step 410 with the ideal image marker 500. As discussed before, the ideal image marker 500 is the image of the marker 500 before the marker 500 is printed. For each captured image, an approximate camera pose is detected using the fiducial markers 580 near the comers of the reference marker 500. The approximate camera pose includes the camera position and camera view direction with respect to the printed marker 500, and the focal parameter which is the camera focus distance along the optical axis between the nodal point of the camera lens and the camera image sensor.

In addition, a set of high accuracy 2D point correspondences are detected which map reference 2D points in the ideal marker image 500 to detected 2D points of the marker in the captured image. The point correspondences are detected with sub-pixel accuracy.

In addition, a set of approximate 3D detected marker points, also called feature points, are estimated corresponding to the world co-ordinates of reference points on the placed printed marker 500. This may be achieved using a marker containing uniquely identifiable feature points. A number of detected marker points of 400 or more was found to be suitable.

Step 420 implements the method 900, which is described below in relation to Fig. 9.

P0005597AU_22263074_1

The output of step 420 is the approximate camera poses, the approximate 3D detected marker points and the set of high accuracy 2D point correspondences for each captured image.

Sub-process 310 proceeds from step 420 to step 430 to determine high accuracy camera poses. The approximate camera poses, the approximate 3D detected marker points and the set of high accuracy 2D point correspondences for each captured image are input to a bundle adjustment algorithm, which varies the approximate points and poses using a camera projection model for each captured image. The bundle adjustment algorithm has an optimisation goal to minimise the least-squares reprojection error over all of the point correspondences in all of the captured images. Optimisation stops when the reprojection error converges or reaches a target accuracy. For example, the optimisation may converge to a root mean square reprojection error of 0.08 pixels.

The output of step 430 is a set of high accuracy camera poses corresponding to the camera position and view direction for each captured image, and a set of high accuracy 3D points corresponding to the reference points on the placed printed marker 500. Although the output poses and points have high accuracy, at this step the 3D point units are scale-free, in that the 3D points and camera positions could be scaled by any scalar constant and still be consistent with the input to step 430.

Therefore, the camera pose for each of the captured images is determined based on the alignment of the corresponding regions of the captured images (i.e., the set of high accuracy 2D point correspondences).

Sub-process 310 proceeds from step 430 to step 440 which performs a surface fit to obtain an approximated shape of the object 110 based on the high accuracy 3D points (obtained at step 430). The high accuracy 3D points and the high accuracy camera poses are scaled to a physical length by comparing the local geodesic distances of the 3D detected points on the placed printed marker 500 and the physical distances between the 2D reference points on the flat printed marker 500. For example, the high accuracy 3D points can be expressed in units of printed marker pixels, or in units of metres. The scaled 3D points are used to fit a low-order polynomial point cloud to the placed printed marker 500. The approximate material surface shape can then be interpolated across the marker cut-out region 570 using the low-order polynomial. Alternatively, the scaled 3D points are used to fit a low-order spline surface.

P0005597AU_22263074_1

The outputs of step 440 are a set of scaled 3D detected points, a set of scaled camera poses and a set of coefficients from a low-order polynomial fit to the approximated surface shape of the object 110. The polynomial fit may be used to generate a set of dense 3D points estimating the macro-scale surface shape. These dense 3D points may be used to generate a 3D mesh and a macro-scale normal map. The 3D mesh is a set of connected polygons which describe the surface shape. The normal map is a set of 3 values at each position on a pixel grid, where the 3 values correspond to the X, Y, and Z magnitudes of the unit vector perpendicular to the material surface at the corresponding position.

Therefore, a shape of the object 110 can be determined (e.g., using a low-order polynomial, low-order spline surface) based on a 3D alignment method using the alignment of the corresponding regions of the captured images (i.e., the set of high accuracy 2D point correspondences).

Sub-process 310 proceeds from step 440 to step 450 to parametrise the surface (i.e., the shape of the object 110). The coefficients from the low-order polynomial surface fit are used to generate an approximate 3D point cloud of the surface of the object 110 such that the centroid of the 3D point cloud is placed at the co-ordinates (0, 0, 0). A parametrisation is estimated which is a set of correspondences between points in the 3D point cloud of the surface shape of the placed printed marker 500 and points in a 2D rectangular regularly spaced pixel grid in a 2D representation of the printed marker 500 as if the marker 500 were flat. The parametrisation and the low-order polynomial fit are used to calculate an approximate point cloud on a pixel grid which corresponds to the pixel grid used to generate the reference marker 500. The output of step 450 is a pixel grid parametrisation from the 3D surface fit to a two dimensional pixel grid, and an approximate 3D point cloud.

Sub-process 310 proceeds from step 450 to step 460 to project textures. The high accuracy camera poses (from step 430), the approximate 3D point cloud (from step 450) and the pixel grid parametrisation (from step 450) are used to create a mapping from the texture map coordinate system to each captured image. From this mapping, a set of texture maps can be generated. Each texture map corresponds to a single captured image. Each pixel address in the pixel grid parametrisation corresponds to a single point on the surface of the object 110. The set of RGB pixel values at a single pixel address from multiple texture maps corresponds to the captured RGB intensity across multiple views of the corresponding single point on the material

P0005597AU_22263074_1 surface. The correspondence is approximate because the input 3D point cloud is approximate. The output of step 460 is a set of approximate texture maps.

Sub-process 310 proceeds from step 460 to step 465 to align the texture maps. A 2D image alignment method is applied to the set of approximate texture maps to create a set of corresponding high accuracy texture maps. After texture alignment, the set of RGB pixel values at a single pixel address from multiple texture maps corresponds precisely to the captured RGB intensity across multiple views of the corresponding single point on the material surface. The captured RGB intensities can vary significantly across different viewpoints for glossy materials. In one arrangement, the alignment is performed using a method known as covariance-based mutual information (CMI), which is able to find correspondences that are robust to these intensity changes. The output of step 465 is a set of high accuracy texture maps.

In step 465, texture map alignment may additionally be used to optimise the approximate point cloud, creating a high accuracy point cloud. In this case, the output of step 465 is a set of high accuracy texture maps and a high accuracy point cloud.

Alternatively, in step 465, texture map alignment is performed by additionally offsetting the point cloud along the normal vector at each point by a distance corresponding to the thickness of the printed marker 500. This approach may be used when the material has a smoothly varying curved surface which the printed marker 500 conforms to tightly, but weak texture that is unsuitable for image-based alignment, in which case the offset point cloud becomes a high accuracy point cloud. The approximate texture maps are then reprojected using the high accuracy point cloud to form a set of high accuracy texture maps. In this case, the output of step 465 is a set of high accuracy texture maps and a high accuracy point cloud.

Sub-process 310 proceeds from step 465 to step 470 to estimate the BRDF. The multi-view images of the white rectangular band 595 from the placed printed marker 500 are extracted from the texture maps and used to estimate a lighting direction. Alternatively, the lighting direction is recorded at the time of capture. The macro-scale normal map and the high accuracy texture maps are used to estimate a BRDF for each pixel in the texture maps. The BRDF includes a diffuse component, which is estimated from minimum or pseudo-minimum values of the high accuracy texture map intensities at each pixel in the texture maps. This diffuse colour estimation is used to create an RGB diffuse map.

P0005597AU_22263074_1

For each pixel in the high accuracy texture maps, a specular normal direction is estimated which is consistent with the observations of the pixel in the multi-view texture map. That is, the brightest samples in the multi-view texture map at a pixel have camera angles which substantially correspond to the reflection of the light source direction by the local surface orientation, as defined by the estimated specular normal direction.

A specular amplitude and roughness are estimated by fitting a BRDF model to the multi-view texture map intensities as a function of the half-angle, which is the dot product of the normal angle and the lighting angle. The fit is used to create a specular amplitude map and specular roughness map.

The output of step 470 is a specular normal map, a specular amplitude map, a specular roughness map and a diffuse map. After step 470, sub-process 310 ends. The outputs from sub process 310, together with the 3D mesh produced at step 440, can be used for rendering the object 110 including a surface BRDF with realistic appearance.

Sub-process 310 therefore provides a means for capturing shape and spatially varying BRDF of an object 110 with a convenient capture process using a standard handheld camera. The sub process 310 produces high accuracy results and is accessible to a wide range of users because it does not require prior camera calibration, multiple cameras, a calibrated camera rig, or specialised depth measurement equipment.

Marker generation

Fig. 5 is a diagram showing a reference marker 500 to assist in the alignment of camera images. Fig. 6 is a diagram showing an example of a general layout of an alignment pattern 600 as located on the marker 500.

The marker 500 is printed on a flexible substrate, such as paper or flexible plastic, so that a print-out of the marker 500 can be laid on a non-flat object (e.g., 110) to be scanned, substantially in conformation with the shape of the object 110 at a scale comparable to the marker size. The marker 500 can also be printed on magnetic paper, for easy placement on curved objects made of magnetic materials such as steel.

The marker 500 has a pattern region 560 printed with a high bandwidth alignment pattern 600, an unprinted border region 595, an opening 570 (which is referred to as the cut-out region in the

P0005597AU_22263074_1 discussion of sub-process 310 above) through which the object 110 to be scanned can be seen, and a set offiducial marks 580 printed in the comer regions instead of the pattern 600. An example of suitable fiducial marks 580 is a 2x2 checkerboard pattern of black and white squares, although other fiducial marks may be used. One of the fiducial marks 590 is printed in a distinguishing manner, such as being rotated relative to the other fiducial marks 580, so that the fiducial mark 590 may be identified uniquely.

Other alternative conformations of the marker 500 are possible. In an alternative conformation, all of the fiducial markers 580 are uniquely identifiable markers, such as data encoded two dimensional ArUco barcode markers.

The high bandwidth alignment pattern 600 on the marker 500 provides uniquely identifiable feature points to assist in aligning images containing the marker 500. In general, the alignment pattern 600 on the marker 500 is formed by the combination of multiple overlapping layers of tiles. Each layer includes square or rectangular tiles that tile the layer without interruption. Each tile is made up of a plurality of pixels, referred to as chart pixels. Each chart pixel of each tile may be represented by a set of number values indicating a colour in some colour space, such as the RGB colour space. When the marker 500 is printed, each chart pixel may be printed with multiple printer resolution pixels.

The pattern 600 (as shown in Fig. 6) includes two tiled layers, the first layer being a tiling of first tiles 610 and the second layer being a tiling of second tiles 620. Fig. 6 shows that the first tiles 610 are periodically produced although only one of the first tiles 610 is shaded. The first tiles 610 and the second tiles 620 are of different sizes, where the dimensions of the two tiles 610 and 620 in pixels are coprime in both the respective horizontal and vertical dimensions. The first tiles 610 tile the first layer as shown by the long-dashed lines 650 indicating tile edges. For ease of identifying one of the first tiles 610, the area of one of the first tiles 610 is shaded by a pattern of thin parallel diagonal lines going from bottom left to top right. The second tiles 620 tile the plane as shown by the short-dashed lines 660 indicating tile edges. For ease of identifying one of the second tiles 620, the area of one of the second tiles 620 is shaded by a pattern of thin parallel diagonal lines goinf from top left to bottom right. For clarity, the long dashed lines 650 and the short-dashed lines 660 are not shown extended throughout the figure, but the tiles 610 and 620 do extend throughout the respective first and second layers.

P0005597AU_22263074_1

The first tile 610 is generated by assigning pseudo-random values to each chart pixel in a single colour channel. The second tile 620 is generated in a similar fashion, but with a different colour channel. The alignment pattern 600 is formed from all of the overlapping tiled layers. The colour value of each chart pixel in the pattern 600 is determined from a combination of the number values by adding the colour channel values of the corresponding chart pixels of each tiled layer and then normalising to a range between black and white. The resulting high bandwidth alignment pattern 600 is composed with other elements to form an image of the marker 500, suitable for printing.

Because each tile dimension in the first layer is coprime with another tile dimension in the second layer, the overlap of the two layers in the marker 500 prevent the visible pattern on the marker 500 from repeating frequently. The visible pattern in the marker 500 repeats only after a large number of tiles, which is proportional to the product of the coprime tile dimensions. Thus, for a typical size marker, every location in the alignment pattern 600, to the resolution of the marker pattern, comprises a uniquely identifiable feature point.

The tile dimensions in the first layer being coprime with another tile dimension in the second layer is not essential. In another arrangement, the tile dimension in the first layer is not coprime with another tile dimension in the second layer.

The high bandwidth alignment pattern 600 has a high spatial frequency bandwidth, which enables high accuracy sub-pixel marker alignment (with respect to chart pixels) in captured images of the placed printed marker. The high bandwidth alignment pattern 600 is preferably generated and printed using a diversity of colours, which enhances the accuracy and reliability of non-rigid alignment using CMI alignment.

Image capture

As described above, step 410 captures images. A method 700 as shown in Fig. 7 is a method of performing the image capture.

Method 700 begins with step 710 to place the printed marker 500 on a surface of the object 110. The placed marker 500 is shown in Fig. 8, with the printed marker 500 placed on top of the surface 820 of the object 110. The surface 820 is visible through the cut-out region 570 in the printed marker 500. After the marker 500 is placed, the position of the marker 500 and the object 110 is fixed for the remainder of method 700.

P0005597AU_22263074_1

The properties of the object 110 that influence the appearance can be spatially varying, and the properties can be described on 3 scales. The first scale is the macro scale which changes on the order of 10 mm or larger. An example of a property in the macro scale is the large scale geometry of the material shape.

The second scale is the meso scale, which changes on the order of 0.1 to 1 mm and can be directly perceived by human vision. Examples of properties in the meso scale are fine changes in colour or small bumps, ridges or valleys in the surface shape.

The third scale is the micro scale, which changes on the order of 0.01 mm or smaller and cannot be directly perceived by human vision, but still influences the appearance of an object. An example of a property in the micro scale is the micro scale surface roughness which influences the sharpness or blurriness of specular reflections.

The method 700 (and other associated sub-process 310, method 300, etc. described herein) are suitable for a material surface 820 which has a curved or flat macro scale shape but without significant discontinuities or occlusions. The meso scale properties can include small to medium bumpiness, and the object 110 may or may not have diffuse colour contrast. The shape of the object 110 should have low enough macro scale curvature and small enough meso scale angular and height variation such that the entire surface 820 within the cut-out region 570 is visible from all camera views. The micro scale properties can include matte, sheen or glossy finish, with isotropic or anisotropic specular reflections.

Method 700 continues from step 710 to step 720 to set camera parameters. A high resolution still camera is used, such as a DSLR, mirrorless camera or smartphone with built in camera. The camera parameters are set to capture images of the placed marker 500 and object 110 with high resolution and contrast, low noise, and high depth of field. The white balance is set manually, for example the white balance is set to flash illumination. The camera can be hand held or placed on a tripod. The camera can be used in auto-focus or manual focus mode.

Method 700 continues from step 720 to step 730 to set flash power. The capturing area has much lower light levels than the flash, for example the images are captured in a room with the ceiling lights switched off. A flash is placed in a fixed position pointing at the placed marker 500, for example the flash is placed on a tripod. The camera is set to flash sync mode and the flash power is controlled to give a well exposed image, without saturated pixels. If the camera viewpoint gives an image with strong specular reflections, then a flash bracket may be taken, by

P0005597AU_22263074_1 capturing 2 images from a similar viewpoint and the same camera parameters but with different flash power settings. The camera can be hand held and the camera viewpoints in the images in the flash bracket do not have to be identical. The image with higher flash power may have saturated pixels, but the same object points will be well exposed in the image with lower flash power. The camera and flash parameters used to capture each image are recorded as metadata attached to the image file. Alternatively, the camera and flash parameters for each image may be manually recorded in a spreadsheet by the operator.

Method 700 continues from step 730 to step 740 to capture images of the placed marker 500 and object 110. Each image is captured at a different viewpoint and saved in a linear intensity format, such as a RAW file. For each captured image, the viewpoint is selected to approximately evenly sample a range of altitude and azimuthal angles around and including a central top-down view of the placed marker. For example, a set of 62 images are captured in a set of 3 expanding rings at 3 different elevations around the placed marker, plus a central top down image, covering elevation angles from 45 degrees to 90 degrees, and covering azimuthal angle varies from 0 to 360 degrees.

Method 700 continues from step 740 to step 750 where a decision is made whether to capture more images. If more images need to be captured to approximately sample a range of viewpoint angles (YES), then the method 700 continues from step 750 to step 720. Otherwise (NO), the method 700 ends.

Marker alignment

Fig. 9 is a flow diagram of a marker alignment method 900 of aligning each of the captured images of the marker 500 with the ideal marker image 500. The method 900 receives as input a camera image containing an image of the marker 500 placed on an object 110 to be scanned. The method 900 also receives the ideal marker image 500 as input. The captured images are then aligned with the ideal marker image 500. The method 900 is implemented in step 420.

The method 900 can be implemented as a software application program 1833 executable by the processor 1805 of the computer system 1800.

The method 900 commences at step 910. In a fiducial detection step 910, the fiducial markers 580 are detected in the captured image. In one arrangement, the uniquely identifiable fiducial marker 590 is detected and used to determine the orientation of the marker 500 in the captured

P0005597AU_22263074_1 image. In an alternative arrangement, all of the fiducial markers 580 are uniquely identifiable and are used to determine the orientation of the marker 500 in the captured image. The method 900 proceeds from step 910 to step 920.

In an approximate pose step 920, the determined orientation of the marker 500 in the captured image is used to determine a homography of the ideal marker image 500 related to the pose of the camera when capturing the image. This determination can be done using algorithms such as PnP (Perspective n-Point) solving in open source computer vision software (e.g., in OpenCV). The method 900 proceeds from step 920 to step 930.

In a non-rigid alignment step 930, a non-rigid alignment method is used to register the pattern 600 in the captured image to the ideal marker image 500, using the homography determined in step 920 as an initial registration estimate. A suitable non-rigid alignment method is CMI alignment, which is described below. CMI alignment has improved accuracy and reliability for textures with colour variation, such as the pattern 600. CMI alignment is more reliable when using an initial registration estimate. The registration of the captured images with the ideal marker image 500 produces aligned images, which have been corrected for distortions such as perspective and barrel distortions during the registration process.

The method 900 proceeds from step 930 to step 940.

In a feature point step 940, feature points in the alignment pattern 600 of the aligned images are detected with high spatial accuracy. The marker alignment method 900 then ends at the conclusion of step 940.

Fig. 10 is a flow diagram of a feature point identifying method 1000 of identifying a feature point in the marker 500 of the aligned images, which can be implemented in feature point step 940. The method 1000 receives as input an aligned image, which is a captured image that has been aligned to the ideal marker image 500 using CMI alignment (at step 930), and which has been thereby corrected for distortions such as perspective and barrel distortions. The method 1000 commences at step 1020.

The method 1000 can be implemented as a software application program 1833 executable by the processor 1805 of the computer system 1800.

P0005597AU_22263074_1

In window size step 1020, the processor 1805 determines a window size over which the subsequent analysis is to be performed. In one arrangement, this window size corresponds to 64 chart pixels square, which may be more than 64 image pixels square depending on the resolution of the marker printing process and the magnification and resolution of the camera images. In the general case, the window size should be approximately the size of the tiles, but generally not smaller than 64 chart pixels square. The method 1000 proceeds from step 1020 to 1030.

In window extraction step 1030, a window of the alignment pattern 600 on the aligned camera image is extracted from the aligned camera image. In other words, a portion of the alignment pattern 600 is extracted from the aligned camera image. The method 1000 proceeds from step 1030 to 1040.

In tile choosing step 1040, a tile is chosen from the ideal marker image 500, for example the first tile 610. The selected tile is upscaled to match the scale at which the tile appears in the aligned image. The method 1000 proceeds from step 1040 to 1050.

In correlation step 1050, the chosen tile is correlated with the window extracted at step 1030, whereby the upscaled tile and the window images are compared with different pixel offsets to determine which offset gives the best match between the upscaled tile image and the window image. The correlation is performed using the tile and the matching colour channel of the window with which that tile was formed in the marker pattern 600. The output of the correlation step 1050 is a correlation image having pixel values that correspond to the strength of the correlation for the offset corresponding to that pixel. Examples of methods of correlation and matched filtering are cross-correlation, zero-normalised cross-correlation, phase correlation, and the like.

The method 1000 proceeds from step 1050 to step 1060.

In peak detection step 1060, the correlation image is searched for the highest peak and the location of that peak. The location of the peak specifies the two-dimensional offset between the tiles and the border of the window. The peak position may be interpolated to determine its position (and hence the two-dimensional offset) more precisely. For example, for the first tiles 610, the two-dimensional offsets provide the offsets of the peaks of each of the first tiles 610 from the border of the extracted window.

P0005597AU_22263074_1

This correlation method can produce high accuracy shift estimates with sub-pixel accuracy of 0.01 pixels in well exposed captured images. This high accuracy can be achieved when the input images contain high contrast textures at high spatial frequencies, and when image shift is the only registration error between the 2 input images to the correlation. The high bandwidth alignment pattern 600 supplies high contrast at high spatial frequencies. CMI alignment in step 930 performs non-rigid alignment with an accuracy of 0.1 to 1 pixels which accounts for image distortions including shift, rotation, scale, lens barrel distortion and projective distortion, and therefore the only significant registration error remaining in the inputs to correlation is sub pixel image shift. The method 1000 proceeds from step 1060 to step 1070.

In tile decision step 1070, a decision is made on whether more tiles exist. If more tiles exist that have not yet been correlated with the window (YES), processing returns from step 1070 to tile choosing step 1040. If all the tiles have been considered (NO), processing continues from step 1070 to step 1080 by passing the two-dimensional offsets, in the respective tile's coordinate system, for each tile 610, 620 to offset determination step 1080.

Therefore, during the first pass, steps 1040 to 1070 of the method 1000 determines the two dimensional offsets for the first tiles 610 with respect to the border of the extracted window. During the second pass of the step 1040 to 1070, the two-dimensional offsets for the second tiles 620 with respect to the border of the extracted window are determined. The loop is repeated for all the tiles. Hence, if there are 3 tile sizes, then steps 1040 to 1070 are repeated 3 times.

Therefore, at the completion of the loop of steps 1040 to 1070, the two-dimensional offsets of all the tiles 610, 620 with respect to the border of the extracted window are determined. The method 1000 proceeds from step 1070 to step 1080.

In offset determination step 1080, the position of the extracted window of the aligned image with respect to the ideal marker image 500 is determined to sub-pixel accuracy. As the first tiles 610 and the second tiles 620 are of different sizes, the difference in two-dimensional offsets of the tiles 610 and 620 is unique at each different area of the ideal marker image 500. Therefore, determining the difference in two-dimensional offsets between the tiles 610 and 620 determines the position of the extracted window with respect to the ideal marker image 500.

The two-dimensional offsets between the border of the extracted window and each of the tiles 610, 620 determine the position of the centre of the extracted window with respect to the ideal

P0005597AU_22263074_1 marker image 500. Each two-dimensional offset represents the position of the centre of the extracted window in tile coordinates modulo the size of the tile that it was correlated against. Each two-dimensional offset has an integer part corresponding to a whole-pixel offset and a non-integer part corresponding to a sub-pixel offset. The sub-pixel part of the offset should be the same for all the tiles 610, 620. However, because CMI alignment and the correlation process are not perfect in the presence of imaging noise, the sub-pixel offsets are generally slightly different. This is accounted for by averaging the sub-pixel offsets from the common rounded nearest integer pixel position in the reference marker coordinate system. For example, if two offsets are 4.9 and 5.1, the non-integer parts are 0.9 and 0.1 respectively, but these are to be averaged with respect to the nearest common integer of 5, resulting in an average non integer part of 0.0, rather than 0.5.

The horizontal and vertical dimensions may be analysed independently, so without loss of generality only the horizontal coordinates are described here; the vertical coordinates are generated using the same process. The following description assumes there are 3 different tiles at the selected scale, but may be extended to any number of tiles greater than 1 in a similar manner. If the horizontal offsets for each of the tiles are denoted 01, 02, and 03 for tiles of horizontal size Ni, N2, and N3 respectively, then the rounded integer parts of the horizontal positions may be written as:

R 1 = floor(Oi + 0.5) R2 = floor(02 + 0.5) Equation (1) R3 = floor(03 + 0.5)

and the fractional remainders are:

Fi =01-Ri F2 - 02 - R2 Equation (2) F3 = 03 - R3

Given these definitions, the nearest integer horizontal position of the centre of the window in aligned camera image coordinates X is a number such that:

X= 01 mod Ni X= 02 mod N2 Equation (3) X=03modN3

P0005597AU_22263074_1

Equations (1) to (3) form a system of equations which is a specific form of the Chinese Remainder Theorem for three remainders. Examples of methods of solving the Chinese Remainder Theorem are systematic searching, sieving, calculation using B6zout's identity, and the like. Using one of these methods leads to the solution X which is the nearest integer horizontal position of the centre of the window with respect to the reference marker image's coordinate system. The horizontal position given to sub-pixel accuracy is:

Hx = X+ (Fi + F2 + F3)/3 Equation (4)

A further application of the Chinese Remainder Theorem to the vertical dimension and addition of the average of the fractional pixel part leads to the vertical position of the centre of the window with respect to the reference marker image's coordinate system.

Thus an accurate estimate of both the horizontal and vertical components of the position of the extracted window with respect to the ideal marker image 500 has been determined. The method 1000 concludes at the conclusion of step 1080.

The method 1000 is applied repeatedly to obtain a set of windows (as extracted at step 1030) from each aligned camera image which together completely or largely cover the area of the alignment pattern 600 on all of the aligned camera images. A set of approximate 3D detected marker points is formed with x and y coordinates as the positions of the windows extracted by step 1030 with respect to the reference marker image's coordinate system, and a z coordinate of 0. These approximate 3D detected marker points lie along the z = 0 plane; however the placed marker is typically not planar, as it conforms substantially to the shape of the surface of the object 110. Later steps resolve this discrepancy to produce high accuracy 3D points. A set of high accuracy point correspondences between the high accuracy 2D points in each captured image and the approximate 3D detected marker points is determined using the aligned camera image windows. In other words, once the positions of the extracted windows of the images are known, corresponding regions (which have the extracted windows) of the captured images can be aligned.

In some cases, in one or more captured images, the aligned camera image windows (as extracted at step 1030) may refer to co-ordinates that are well outside the ideal marker image 500 or a significant distance from the expected position in the ideal marker image 500 determined using non-rigid alignment in step 930. These invalid points are detection errors, which may be caused by image noise, image saturation, defocus, strong projective distortion or

P0005597AU_22263074_1 alignment errors. The invalid points are flagged as false matches, and dropped from the set of high accuracy point correspondences for the corresponding captured images. Further, peak quality measures can be used to drop points, for instance if the height of the peak located by the peak detection step 1060 is not sufficiently large for any of the tile correlations for a window, the associated point is dropped from the set of high accuracy point correspondences. In this way, the marker alignment method is robust to image artefacts and alignment errors, without incurring additional computational costs and reduced accuracy that typically result from numerical outlier rejection methods such as random sample consensus (RANSAC).

Camera pose determination

The high accuracy point correspondences from step 1080 and the approximate camera poses from step 920 are used in step 430 of method 400 to generate high accuracy camera poses and high accuracy 3D detected marker points. In other words, the camera pose for each of the captured images is determined based on the alignment of the corresponding regions of the captured images.

As discussed above, a bundle adjustment algorithm minimises the least-squares reprojection error over all of the point correspondences in all of the captured images. A reprojection error is computed for each marker point for each camera view, the reprojection error being the difference between the observed location of the marker point in an image, and the predicted location of the marker point in the image given the approximated camera pose and marker point position. The predicted location may be computed using a standard pinhole camera model.

A local gradient of a function relating marker point locations to reprojection errors is computed in the neighbourhood of each estimated marker point location. Similarly, a local gradient of a function relating camera poses to reprojection errors is computed in the neighbourhood of each estimated camera pose. A delta is then chosen for each marker point and camera pose which minimises the sum of the squares of the errors relating to that marker point or camera pose respectively. The delta is chosen in the direction of the local gradient, and of a magnitude proportional to the local gradient. A scaling factor may be used to weight marker point adjustments and camera pose adjustments to comparable magnitudes. This process is performed jointly for all marker points and camera views, and is iterated until convergence, or equivalently when the delta on subsequent iterations no longer substantially reduces. Upon convergence, the poses of the cameras and 3D detected marker points have been refined to

P0005597AU_22263074_1 consistently match the camera images. The output camera poses and 3D detected marker points thus have high accuracy. This joint optimisation approach is known as bundle adjustment.

In an alternative arrangement, the bundle adjustment step optimises not only the 3D marker point positions and camera poses, but also camera calibration intrinsics modelled by the pinhole camera model such as the focal distance, principal point offset, and distortion parameters. These parameters are used when the reprojection error is computed, and jointly adjusted along with the marker points and camera views in each iteration. Scaling factors may be used to weight camera calibration intrinsic parameter adjustments, camera pose adjustments, and marker point adjustments to comparable magnitudes. This alternative arrangement can produce an optimisation which accounts for differences in the camera intrinsic parameters, but is more computationally expensive.

If the texture maps are to be registered using only CMI, approximate camera poses may be sufficient. However, if the 3D structure of the object is to be determined by optimizing the point clouds, highly accurate camera poses are necessary.

Surface fitting, parametrisation and texture projection

As described above, the method 400 proceeds from step 430 to step 440. Step 440 performs surface fitting, which can be implemented by method 1100 as shown in Fig. 11. Method 1100 commences at a scale estimation sub-process 1110 where the scale of the point cloud of high accuracy 3D points (from step 430) is determined relative to the scale of the printed marker 500. As the placed marker 500 deforms to conform to the approximate shape of the surface of the object 110, the scale estimation sub-process 1110 accounts for the deformation of the placed marker 500 as well as the scale-free nature of the high accuracy 3D points. Accordingly, the scale estimation sub-process 1110 determines a scale factor S that would scale the 3D point cloud to have a known unit associated with the printed marker 500, such as printed marker pixels or metres. The scale estimation process 1110 is described in more detail hereinafter. The method 1100 proceeds from step 1110 to step 1120.

A point cloud scaling step 1120 applies the scale S determined at step 1110 to the high accuracy 3D point cloud by multiplying each coordinate of each of the high accuracy 3D points by the scale factor S. The multiplication produces a scaled point cloud of high accuracy 3D points with a known unit. The method 1100 proceeds from step 1120 to step 1130.

P0005597AU_22263074_1

A pose adjustment step 1130 adjusts the high accuracy camera poses (from step 430) to maintain the same magnification with respect to the scaled 3D point cloud as the magnification of the camera poses with respect to the 3D point cloud before the point cloud scaling step 1120. The method 1100 proceeds from step 1130 to step 1140.

A surface fitting step 1140 fits a surface to the scaled point cloud:

xs =fitx(Xm, ym)

Ys fity(Xm, ym)

ZS =fitz(Xm, Ym)

whereby a surface coordinate (xs, ys, zs) can be evaluated given a reference marker coordinate (xm, ym). Accordingly, three separate fitting functions fits, fity, and fitz are used to fit the three respective components of the surface xs, ys and zs. At the beginning of the surface fitting step 1140, there is a sparsely known association between the scaled high accuracy 3D points on the placed marker 500 with coordinates (xy, yp, zy) and the positions of the placed marker 500 with coordinates (xm, ym).

According to an arrangement, each fitting function is a low-order Legendre polynomial, of degree 4, and each fitting function is stored as a set of coefficients defining the Legendre polynomial.

According to an alternative arrangement, each fitting function is a low-order smooth bivariate spline, of degree 4, and each fitting function is stored as a set of coefficients and knots defining the spline. The resulting fitting functions can be evaluated at the positions of the placed marker 500 other than the original sparse marker positions, and are therefore used to interpolate surface positions.

The method 1100 concludes at the conclusion of step 1140.The scale estimation sub-process 1110 is described in more detail with reference to Fig. 12. Sub-process 1110 commences at a nearest neighbour finding step 1210, which finds the six nearest neighbours of each of the sparse high accuracy 3D points. If the points are arranged substantially along a grid, then the six nearest neighbours may include two vertically adjacent points along the grid, two horizontally adjacent points along the grid, and two diagonally adjacent points along the grid. For points

P0005597AU_22263074_1 along the boundary of the marker 500 or in the case of issues such as specular reflections causing some marker regions to be unreadable, the six nearest neighbours may also include some farther points. Neighbours further than 1.4x the minimum neighbour distance for that point are ignored for the purpose of the scale estimation sub-process 1110, so fewer than six nearest neighbours may be found for each point. Sub-process 1110 proceeds from step 1210 to step 1220.

A surface distance measurement step 1220 measures the distance dsurface along the surface of

the placed marker 500 between each pair of neighbouring points found by the nearest neighbour finding step 1210. According to one arrangement, this is approximated as the 3D Euclidean distance between the points.

According to another arrangement, surface fitting step 1140 is performed before scale estimation step 1110 instead of after pose adjustment step 1130, and the distance dsurface is

calculated as a geodesic distance along the fitted surface.

Sub-process 1110 proceeds from step 1220 to step 1230.

A reference marker distance measurement step 1230 measures the distance dreference along the

ideal marker image 500 between the ideal marker positions corresponding to each pair of neighbouring points. As the ideal marker image 500 is not deformed by the surface of the object 110, the distance dreference is a 2D Euclidean distance. The units of this distance dreference

will be used as the scale of the scaled 3D point cloud at the end of the point cloud scaling step 1120.

According to one arrangement, the physical marker size is known, dreference is given in units

of metres, and the scaled 3D point cloud at the end of the point cloud scaling step 1120 is therefore also in units of metres.

According to another arrangement, dreference is given in units of pixels, and the scaled point

cloud at the end of the point cloud scaling step 1120 is therefore also in units of printed marker pixels.

Sub-process 1110 proceeds from step 1230 to step 1240.

P0005597AU_22263074_1

A typical distance ratio calculation step 1240 calculates a neighbour scale factor Sneighbour for

each pair of neighbouring points as the fractiondreferece for that particular pair of neighbouring dsurface

points (that is, the distance along the ideal marker image 500 divided by the distance along the surface of the placed marker 500). The neighbour scale factor Sneighbour represents the scaling

required to adjust the 3D positions of that neighbour pair to be in the units of distance along the ideal marker image 500. Then the typical neighbour scale factor Stypica, is determined.

According to one arrangement, the typical neighbour scale factor Stypica is the mean of the

neighbour scale factors Sneighbour of all the pairs of neighbour points found by the nearest

neighbour finding step 1210. According to another arrangement, the typical neighbour scale factor Stypica is the median of the neighbour scale factors Sneighbour of the pairs of neighbour

points. The typical neighbour factor is stored as the scale factor S estimated by the scale estimation sub-process 1110. The sub-process 1110 concludes at the conclusion of step 1240.

As described earlier, the method 1100 (which is implemented at step 440) proceeds from sub process 1110 to steps 1120, 1130, and 1140. Once the surface fitting method 1100 is concluded at step 1140, the step 440 also concludes. Referring to Fig. 4, the method 400 proceeds from step 440 to step 450.

Step 450 of parametrisation is described with reference to Fig. 13. Step 450 commences at a surface alignment step 452, which aligns a best-fit plane through the fitted surface to the (flat) reference marker 500. This is performed as a least-squares rigid fit of the rotation and translation of the fitted surface to the (flat) reference marker 500 at the z = 0 plane. The camera poses are also transformed according to this rotation and translation so as to maintain the cameras' relative geometry to the fitted surface. Before surface alignment step 452, the origin and orientation of the fitted surface in 3D space is arbitrary. The surface alignment step 452 is performed to ensure that the 3D origin and orientation of thefitted surface are consistent with respect to the placed marker 500.

Step 450 then proceeds from step 452 to the surface evaluation step 454, which evaluates the surface position using the fitted low-order polynomial functions along a regular grid of positions of the placed marker 500, including points in the marker's cut-out region 570. This grid is typically denser than the points present in the high accuracy 3D point cloud, so the evaluation yields a dense grid of associations between 3D surface points and points along a 2D

P0005597AU_22263074_1 grid (that is, with respect to the placed marker 500). These associations form a parametrisation from the 3D surface onto the 2D space of the placed marker 500. Step 450 concludes at the conclusion of step 454.

Texture alignment

To accurately estimate the BRDF of an object 110 at a point, the colour and intensity of that point under a variety of viewing positions must be sampled. When that sampling is via a collection of images taken at different viewing positions 120, 130, 140, the point on the object 110 must be matched to corresponding positions on each image wherefrom the colour and intensity may be sampled.

If both the 3D position of a point on an object 110 is accurately known and the camera pose of an image is known, the 3D point can be mapped into a 2D position on the image by performing a projection operation. To obtain the highest accuracy, it may also be necessary to incorporate various lens distortion terms and camera sensor effects. Examples of the modelling of lens distortion terms and camera sensor effects are a radial low-order polynomial mapping between rectilinear Cartesian sensor co-ordinates and lens distorted image co-ordinates including a 2D offset of the centre of the camera sensor from the optical axis, and the like. If the 2D position on all images of a single object point is known, the RGB value of that object point can be interpolated from all of the images, giving colour samples usable for BRDF estimation at that object point.

As has been described in step 430, accurate 3D points and poses can obtained by measuring an alignment pattern 600 of the marker 500 to find common object points in the marker 500, and then using bundle adjustment to identify accurate 3D points and poses. By use of a surface fit 440 which fits accurate 3D points on the marker 500, a very accurate mapping can be determined between all 2D positions on the marker 500 to corresponding 3D positions, and thence to corresponding 2D positions on the images.

The ideal marker image 500 forms a 2D coordinate system divided into pixels on a regular rectangular grid. By using the surface fit, each pixel in this 2D coordinate system can be assigned a 3D point, representing a point cloud. Each 3D point in this point cloud can be projected onto each image using the camera pose, resulting in mappings from the marker coordinate system to corresponding pixels in every image. By the use of these mappings, a set of texture maps can be generated with each corresponding position in every texture map

P0005597AU_22263074_1 corresponding to different views of the same point on the object surface, from which a BRDF can be calculated.

The method 1400 (which is implemented at step 460) to project textures to texture maps is described with reference to figure 14.

The first inputs to this method 1400 are a parametrized approximate point cloud 1401 (generated from parametrization step 450) and a set of camera poses 1402 (generated from camera pose determination step 430).

The method 1400 commences at step 1410, in which each point in the set of point clouds 1401 is projected using a camera pose 1402 to determine a position of that point in one of the digital photographs (captured at step 410). The projection operation is a projection operation defined for 3D geometry, and incorporates determined parameters intrinsic to the camera, such as focal parameter, sensor pixel pitch, lens distortion and other parameters required to accurately calculate the projected position. This mapping from a 3D point in the point cloud to a 2D point in a camera image is stored as a 2-dimensional vector in a projection warp image at a position corresponding to the pixel in the reference marker image. This vector is determined by combining the point cloud image, which maps from marker position to 3D point, with the projection operation, which maps from 3D point to a position in a camera image, resulting in a 2D warp from a 2D coordinate of the marker position to a 2D coordinate in the camera image.

The result of step 1410 is a collection of projection warps 1403, with each pixel in a projection warp image providing a mapping from a reference marker coordinate pixel to a camera image pixel. The method 1400 proceeds from step 1410 to step 1420.

Step 1420 warps the camera images using an inverse warping algorithm and takes as input the projection warps 1403 and the camera images1404, and outputs a set of projected texture maps 1405. A warped pixel in a projected texture map is generated by examining the corresponding pixel in the corresponding projection warp image, which gives a 2-dimensional vector specifying a position in the corresponding camera image, and estimating an RGB value at that position. As the position might not correspond exactly to a pixel position in the camera image, the value can be determined by interpolation. Examples of interpolation methods, in order of increasing quality, are nearest-neighbour, linear, cubic or since interpolation. For this application, high quality is desirable, so the interpolation method should be of a quality at least as good as cubic interpolation.

P0005597AU_22263074_1

The result of step 1420 is a set of projected texture maps 1405 which are substantially aligned. Method 1400 then terminates at the conclusion of step 1420.

If the shape of the object 110 being measured conforms exactly to the 3D surface fit generated by interpolating the 3D points on the surface of the placed marker 500, and the point cloud can be correctly offset by the thickness of the printed marker 500, then the act of offsetting by the thickness of the printed marker 500 forms the texture alignment step 465, and the projected texture maps will then be accurately registered, and accurate samples can be gathered from the projected texture maps for BRDF estimation. The marker thickness offset method is useful when the object curvature is constant or smoothly varying across the placed marker 500 including the cut-out region 570, and when the object 110 has weak texture and would therefore be difficult to align in the cut-out region 570 using image comparison. For example, the object 110 may be a curved metal object with a glossy paint finish.

If the object 110 is not smooth, however, it is very unlikely that the surface fit will closely conform to the object surface. Errors of only a fraction of a millimetre will result in a parallax error of multiple pixels in the projection warps.

To deal with the parallax error, a registration process 465 using CMI alignment can be used to more closely align the texture maps, especially when the object 110 has texture suitable for alignment using image comparison. Because CMI alignment is robust to changes in image contrast and colour, such as lighting changes, accurate alignment can be achieved even across multiple views with large spatially varying changes in specular reflection intensity. The texture alignment method 1500 using CMI is described with reference to figure 15.

The input to method 1500 is the set of projected texture maps 1505 (output from method 1400), the set of camera images 1504 (captured at step 410), the set of projection warps 1403 (output of step 1410), and, optionally, the set of camera poses as 1502 (output from step 430).

The method 1500 commences at step 1510 by selecting a reference texture map to use as the fixed image in a registration process. The fixed image should be chosen to contain a representative view of the object 110 to which all the other projected texture maps can be aligned, and, optionally, should also be of sufficient quality for registration to be performed as accurately as possible. One way to choose the representative view is to select the image corresponding to a camera pose which is closest to the average normal to the object surface.

P0005597AU_22263074_1

This will be the pose in which the normalized transformation vector (tX, ty, t) has the absolute largest value of t. If there are multiple images which have a similar value of t, or if the image selected is of particularly low quality, then a further selection step can be performed to choose a reference image which is in good focus and does not contain too much specular reflection. The method 1500 proceeds from step 1510 to step 1520.

Step 1520 calculates reference map warps by calculating optical flow fields between the selected reference texture map and all projected texture maps. One good algorithm to calculate such warps is covariance-based mutual information (CMI), used previously in the marker alignment step 420. The result of the step 1520 is a set of reference warp maps 1521, with each warp map 1521 being the same size as the texture maps, and the values in each warp map 1521 being a 2-dimensional vector indicating a position in the corresponding texture map which is registered to the corresponding position in the reference texture map.

These reference map warps 1521 could be used to register all of the projected texture maps 1505 with the reference texture map, but this is undesirable because the result would be images which have been interpolated twice, resulting in a loss of image quality.

The method 1500 proceeds from step 1520 to step 1530.

To preserve more quality in the registered images, each of the reference map warps 1521 is composed in step 1530 with the projection warps 1503, resulting in a set of composed registration warps 1522, with each registration warp 1503 mapping from pixels in the reference texture map to pixels in the corresponding digital photograph 1504. The method 1500 proceeds from step 1530 to step 1540.

In step 1540 the digital photographs 1504 are warped using the composed registration warps 1522 to create registered texture maps 1523. These registered texture maps can be used in the BRDF estimation step 470 of the method 400. The method 1500 ends at the conclusion of step 1540.

Because of the CMI alignment procedure (which is a method of implementing an optical flow procedure), the registration of the registered texture maps 1523 is of higher quality than that of the projected texture maps 1505, and are a much better input to BRDF estimation step 470. However, the registered texture maps 1523 might not be perfect, as several problems can occur

P0005597AU_22263074_1 during the optical flow estimation. Where the object 110 has periodic structure, such as a piece of woven material, the registration may be subject to "fencepost errors" in which registration occurs between nearby but similar features, and the error is propagated across a wide area. Where the object 110 is obscured by specular reflection, or is too smooth, the object 110 might not have any alignable features. Optical flow algorithms are not always perfect, and regions of an image may suffer from poor registration due to failure of the algorithm to operate correctly.

Another problem with using registered texture maps 1523 for BRDF estimation step 470 is that the 3D structure of the surface of the object 110 is neglected by the surface fit, thus the texture maps and point cloud are not sufficiently detailed to properly calculate surface normals or to render the object 110 using 3D geometry.

A further processing pass can be used to ameliorate these problems by optimising the point cloud for the surface, from which better-registered texture maps can be rendered and surface normals calculated. A method 1600 for refining the point cloud is described in figure 16.

The method 1600 has input of a parametrized point cloud 1601 (the approximate point cloud from step 450), a set of camera poses 1502, a set of reprojection warps 1522, and the set of camera images 1504. Initially, all of the points in the point cloud are to be corrected. The method 1600 commences at step 1610.

In step 1610, the method 1600 checks whether there are more points to be corrected. If there are more points to be corrected (YES), the method 1600 proceeds from step 1610 to step 1620. If there are no further points to be corrected (NO), the method 1600 outputs a corrected point cloud 1609.

In step 1620 the next 3D point is selected, and the selected 3D point corresponds to a position in the reference texture map. Through the reprojection warps 1522, this point also corresponds to a fairly accurate position in each of the camera images 1504 of the object 110. The method 1600 proceeds from step 1620 to step 1630.

In step 1630, the method 1600 calculates the reprojection error of the 3D point mapped onto the camera images 1504. The reprojection error is calculated from the root-mean-square sum of the distances between a projected 2D point determined by projecting the 3D point onto the cameraimages 1504 using the camera pose 1502, and the fairly accurate 2D position of that point calculated through the reprojection warps 1522.

P0005597AU_22263074_1

Because the 3D point is only an approximation of the true position of the 3D point on that part of the object 110, the reprojection error is likely to be high, possibly being tens of pixels. The method 1600 proceeds from step 1630 to step 1640.

In step 1640, the method 1600 determines a corrected position for the 3D point which minimizes the reprojection error. This position is determined by triangulation, and can be found very accurately by searching for a solution using an optimization method such as Nelder-Mead. Acceptable accuracy can also be attained much more quickly by making linear approximations and directly calculating by linear least squares a 3D point which minimizes the reprojection error.

Accuracy can be improved still further by ignoring points which are likely to have high error, for example, measurements involving poses with a camera view altitude angle lower than 50, or which have a reprojection error more than double the minimum reprojection error.

The method 1600 proceeds from step 1640 to step 1650.

In step 1650, the method 1600 replaces the approximate 3D point in the point cloud with a corrected point, and then the method 1600 returns from step 1650 to step 1610 to process more points.

As discussed above, if in step 1610 there are no more points to be processed (NO), the corrected point cloud is output 1609 and the method 1600 concludes.

The corrected point cloud 1609 can be used to create optimised texture maps which in some circumstances can be superior to registered texture maps 1523, and have the advantage of representing the 3D structure of the object 110, from which normals can be calculated, and an accurate mesh generated.

The method to project optimised texture maps is identical to method 1400 except that the input to the method is the corrected point cloud 1609 from method 1600 instead of the parametrized point cloud 1401.

BRDF estimation

The process 470 of BRDF estimation is detailed in Fig. 17 as process 1700.

P0005597AU_22263074_1

BRDF estimation process 1700 begins at normalisation step 1710, in which the aligned texture maps (from step 465) are normalised for light intensity. The pixel intensities of the aligned texture maps depend on the intensity of the flash or lighting used when the camera images received in step 410 were captured. These intensities may be varied, for example in order to capture data at a high dynamic range. To normalise the intensities for BRDF estimation, the pixel intensities in each image are multiplied by the relative lighting intensity. For example, in an image set in which a first subset were captured with 100% flash power, and a second subset were captured with 50% flash power, the pixel values of the aligned texture images in the second subset are halved. Some pixels intensities in some captured images may be saturated. Saturated pixel intensities are not used for BRDF estimation. The method 1700 proceeds from step 1710 to step 1720.

In light pose estimation step 1720, the pose of the light source is estimated in the capture co ordinate system. The printed marker 500 includes a white rectangular band 595 in the marker boundary, which is a constant white colour. The light source direction may be estimated by computing a histogram of intensities in two spherical co-ordinate dimensions and finding a peak. That is, the pixel intensities of the constant white region 595 in each aligned texture map image are assigned to histogram bins according to the altitude angle and azimuth angle formed by a ray leaving the camera and reflecting off the surface at the corresponding pixel, according to the macro-scale surface normals determined in surface fitting step 440. After this histogram binning process is repeated for all of the aligned texture map images, then light source direction may be estimated as the histogram bin with the brightest intensity. Other methods may be used, such as fitting a radial function to the histogram, and using the peak of the fitted function as the light source direction estimate. The light source distance may be estimated manually according to the capture arrangement, or arbitrarily assigned to a large distance, in order to acquire an absolute 3D light source position.

Alternatively, in step 1720, the light source distance and direction may both be measured manually during image capturing step 410 to determine a 3D light source position.

The method 1700 proceeds from step 1720 to step 1730.

In normal estimation step 1730, the specular normals of the surface of the object 110 are estimated. The specular normals are typically per-pixel vectors defining the local direction at which light is reflected according to the aligned texture map images, in contrast to the macro

P0005597AU_22263074_1 scale normals computed in surface fitting step 440, which are typically normals of a fitted low degree parametric surface to a sparse set of feature points. The specular normals may be characterised as higher accuracy normals than the macro-scale normals.

The specular normals are estimated on a per-pixel basis. For each pixel location, a set of pixel samples is formed, each pixel sample comprising a pixel intensity and a 3D direction. One pixel sample is formed per aligned image. The pixel intensity is obtained from the pixel co ordinates of the corresponding aligned image. The direction is the average of the camera direction and the light source direction. The samples are then clustered into two clusters according to intensity. K-means clustering or other clustering methods known in the art may be used. The specular normal direction may be determined as the centroid or weighted average direction of the specular cluster (the cluster with higher intensity). If alignment is imperfect, it can be beneficial to detect and remove outliers from the specular cluster; that is, samples whose direction varies greatly from the mean direction of the cluster may be removed from the specular cluster. The method 1700 proceeds from step 1730 to step 1740.

In diffuse colour estimation step 1740, a diffuse colour at each pixel is estimated. To determine a diffuse colour at a pixel location, the minimum intensity sample may be used. Alternatively, the 10th-percentile brightest sample (that is, the sample brighter than 10% of all other samples and less bright than 90% of other samples) may be selected as the diffuse colour, in order to avoid the impact of outlier dark samples which may be present. The method 1700 proceeds from step 1740 to step 1750.

In specular estimation step 1750, specular amplitude and roughness values are determined at each pixel. The specular amplitude is a measure of how brightly the local surface reflects specular light. A surface reflecting most of the incident light has a high specular amplitude, while a surface absorbing most of the incident light has a low specular amplitude. The specular roughness is a measure of the spread of reflected specular light at the local surface. A very rough surface which reflects incident light in all directions has a high specular roughness, while a mirror-like surface which reflects incident light in only the reflected direction has a low specular roughness.

The specular amplitude at a pixel location may be estimated according to the intensity of the brightest sample of that pixel across the aligned images for each camera view. The specular roughness at a pixel location may be estimated according to the sharpness of the intensity drop

P0005597AU_22263074_1 off around the specular peak. These values may be determined empirically based on these properties of the pixel samples. For example, the specular amplitude may be set to the intensity of the brightest sample, and the specular roughness may be set to the variance of the angular distances of the specular samples for the pixel from the specular peak angle. Alternatively, these values may be determined by fitting a BRDF function to the pixel samples at each pixel.

The resulting specular normals, diffuse colour, specular amplitude, and specular roughness values at each pixel together form the parameters of a spatially-varying BRDF model for the material surface. The present disclosure may generally be practised with any BRDF model. For BRDF models containing fewer parameters than specular normals, diffuse colour, specular amplitude, and specular roughness, estimation of one or more parameters may be omitted from the BRDF estimation process 1700. For BRDF models containing additional parameters, the additional parameters may be additionally estimated, or omitted from the model.

An exemplary BRDF model with which the present disclosure may be practiced is the Phong model, which may be expressed as follows.

Ip = kd(L -!R) + ks#$ -)a

Ip, the reflected illumination at point p, is the sum of a diffuse term and a specular term. The

diffuse term is the diffuse amplitude kd multiplied by the dot product of the light source direction L and the specular normal direction R. The diffuse amplitude kd may be determined inversely to the specular amplitude k, for example using kd = 1 - k. The specular term is the specular amplitude k, multiplied by the dot product of the reflection direction f and the viewing direction 9 raised to the inverse power of the specular roughness a. The reflection direction N is the direction that a perfectly reflected ray from the light source would be reflected according to the local specular normal direction R.

The method 1700 ends at the conclusion of step 1750.

CMI alignment

To perform accurate shape reconstruction, diffuse colour estimation, and BRDF estimation of a 3-dimensional object from images of the object 110 taken from different camera positions 120, 130, 140, the images must be aligned to sub-pixel accuracy. Compared to one another, such

P0005597AU_22263074_1 images show perspective distortion, parallax, and occlusions. These differences mean that rigid alignment methods are inadequate to align the images, thus non-rigid alignment methods are required. A suitable alignment method uses covariance-based mutual information (CMI).

CMI alignment generates a mapping from one image to another by generating an initial mapping, and then assessing the alignment quality and refining the mapping in a loop until convergence is reached. The alignment quality associated with a mapping is measured using mutual information, which is a measure of pointwise statistical commonality between two images in terms of information theory. The mapping being assessed (from a distorted image to a reference image) is applied to the distorted image, and mutual information is measured between the reference image and the transformed distorted image. The colour information of each image is quantised independently into the same number of colour clusters, for example 256 clusters, for the purposes of calculating the mutual information. This may be done by using the k-means algorithm, for example. Each colour cluster is represented by a colour label (such as a unique integer per colour cluster in that image), and these labels are the elements over which the mutual information is calculated. A mutual information measure I for a first image containing a set of pixels associated with a set of labels A = {ai}and a second image containing a set of pixels associated with a set of labels B = {b 1}, is defined as follows in Equation (5):

I - Zij P(ai, b1 )log 2 , Equation (5)

where P(ai, b ) is the joint probability value of the two labels ai and bj co-occurring at the

same pixel position, P(ai) and P(b) are the marginal probability distribution values of the

respective labels ai and b, and log2 is the logarithm function of base 2. Further, i is the index

of the label ai and j is the index of the label b. If the product of the marginal probability values

P(ai) and P(b1 ) is zero (0), then such a pixel pair is ignored. According to Equation (5), the

mutual information measure quantifies the extent to which labels (e.g., a, b) co-occur at the same pixel position in the two images relative to the number of occurrences of those individual labels (e.g., a, b) in the individual images. The extent of label co-occurrences is typically greater between aligned images than between unaligned images, according to the mutual information measure. In particular, one-dimensional histograms of labels in each image are used to estimate the marginal probabilities of the labels (i.e. P(ai) and P(b)), and a pairwise

histogram of co-located labels are used to estimate the joint probabilities (i.e. P(ai, b)).

P0005597AU_22263074_1

The mutual information measure I may be calculated only for locations within the overlapping region of the two images. The overlapping region is determined for example by creating a mask for the distorted image and the reference image, and applying the mapping being assessed to the distorted image's mask producing a transformed distorted image mask. Locations are only within the overlapping region, and thus considered for the probability distribution (i.e. P(ag, bj), P(ag), and P(bj)), if they are within the intersection of the reference image mask and

the transformed distorted image mask.

Alternatively, instead of creating a transformed distorted image, the probability distributions (i.e. P(ag, bj), P(ag), and P(bj)) for the mutual information measure I can be directly estimated

from the two images and the mapping being assessed using the technique of Partial Volume Interpolation. According to Partial Volume Interpolation, histograms involving the transformed distorted image are instead calculated by first transforming pixel positions (that is, integer valued coordinates) of the distorted image onto the coordinate space of the reference image using the mapping. Then the label (e.g., a, b) associated with each pixel of the distorted image is spatially distributed across pixel positions surrounding the associated transformed coordinate (i.e. in the coordinate space of the reference image). The spatial distribution is controlled by a kernel of weights that sum to 1, centred on the transformed coordinate, for example a trilinear interpolation kernel or other spatial distribution kernels as known in the literature. Then histograms involving the transformed distorted image are instead calculated using the spatially distributed labels.

The mutual information measure I of two related images is typically higher when the two images are well aligned than when they are poorly aligned.

The alignment process estimates a displacement field, or mapping, comprising an array of 2D vectors. In the displacement field each vector describes the shift for a pixel from the distorted image to the reference image.

The displacement field is estimated by first creating an initial displacement field. The initial displacement field is the identity mapping consisting of a set of (0, 0) vectors. Alternatively, the initial displacement field may be calculated using an initial estimate of the mapping from the distorted image to the reference image, if available. An initial estimate of the mapping may be available from a prior coarse alignment step, or from geometric calculations of the image distortion based on the camera pose relative to a reference object in the scene.

P0005597AU_22263074_1

Displacement field estimation then proceeds by assigning colour labels to each pixel in the images, using colour clustering as described above. A first pixel is selected in the distorted image, and a second pixel is determined in the reference image by using the initial displacement field. A set of third pixels is selected from the reference image, using a 3x3 neighbourhood around the second pixel.

A covariance score is calculated for each pixel in the set of third pixels, which estimates the statistical dependence between the label of the first pixel and the labels of each of the third pixels. The covariance score (Ci,) for labels (ai, bj) is calculated using the marginal and joint

histograms determined using Partial Volume Interpolation, as described above. The covariance score is calculated using equation (6):

P(ai,b) Equation(6) Euto 6 Ci7=P(ai,bj)+P(ai)P(bj)+E

where P(ai, bj) is the joint probability estimate of labels ai and bj placed at corresponding

positions of the distorted image and the reference image determined based on the joint histogram of the two images, P(ai) is the probability estimate of the label ai appearing in the distorted image determined based on the marginal histogram of the distorted image, and P(bi) is the probability estimate of the label bj appearing in the reference image determined based on

the histogram of the reference image. E is a regularization termto prevent a division-by-zero error, and can be an extremely small value (e.g., 0.00001). Corresponding positions for pixels in the distorted image and the reference image are determined using the initial displacement field. In equation (6), the covariance score is a ratio, where the numerator of the ratio is the joint probability estimate (i.e., P(ai, bj)), and the denominator of the ratio is the joint probability

estimate (i.e., P(ai,bj)) added to the product of the marginal probability estimates (i.e., P(ai)

and P(bj)) added to the regularization term (i.e., E).

The covariance score Ci, has a value between 0 and 1. The covariance score Ci, takes on

values similar to a probability. When the two labels appear in both images, but rarely co-occur,

Ci, approaches 0, i.e. P(ai, bj) « P(ai)P(bj). Ci, is 0.5 where the two labels are statistically

independent, i.e. P(ai, bj) = P(ai)P(bj). Ci, approaches 1.0 as the two labels co-occur more

often than not, i.e. P(ai, bj) » P(ai)P(b).

P0005597AU_22263074_1

Candidate shift vectors are calculated for each of the third pixels, where each candidate shift vector is the vector from the second pixel to one of the third pixels.

An adjustment shift vector is then calculated using a weighted sum of the candidate shift vectors for each of the third pixels, where the weight for each candidate shift vector is the

covariance score Cj for the corresponding third pixel. The adjustment shift vector is used to

update the initial displacement field, so that the updated displacement field for the first pixel becomes a more accurate estimate of the alignment between the distorted image and the reference image. The process is repeated by selecting each first pixel in the distorted image, and creating an updated displacement field with increased accuracy. A regularisation step is then applied to smooth any outlier vectors, by applying a low-pass filter to the displacement array.

The displacement field estimation method then determines whether the alignment is completed based upon an estimate of convergence. Examples of suitable convergence completion tests are a predefined maximum iteration number, or a predefined threshold value which halts the iteration when the predefined threshold value is larger than the root-mean-square magnitude of the adjustment shift vectors corresponding to each vector in the displacement field. An example threshold value is 0.001 pixels. In some implementations, the predefined maximum iteration number is set to 1. In majority of cases, however, to achieve accurate registration, the maximum iteration number is set to at least 10. For smaller images (e.g. 64x64 pixels) the maximum iteration number can be set to 100. When the alignment is completed, then the updated displacement field becomes the final displacement field. The final displacement field may then be used to warp the distorted image into a coordinate space substantially in alignment with the reference image.

Example(s)/Use Case(s)

In an example use case, the method 300 is used for e-commerce. A sample patch is selected of material from an item for sale on a website. For example, a region on the side of a leather handbag, or the sleeve of a shirt, or the fabric of a dress. Step 310 is used to capture the shape and BRDF of the sample patch. Step 320 is used to render the object and BRDF on the e commerce website. A user considering purchasing the item for sale can then view a digital representation of the item, which should more accurately represent the physical item.

In another example use case, the method 300 is used for visual effects in the video, television and cinema industry. When planning a scene which includes visual effects, the visual effects

P0005597AU_22263074_1 supervisor identifies physical set pieces or props that need to be virtually animated or destroyed using visual effects. The appearance could be manually painted by artists, but this takes significant time and skill. Instead, the object shape and BRDF of the set piece or prop is captured during movie production using step 310. In post-production, the shape and BRDF is edited and manipulated by artists and then rendered to appear to animate or break apart. The rendered appearance of the virtual item should more accurately represent the physical set piece or prop. The audience watching the edited movie should not be able to perceive that object shown in the movie is a virtual object.

P0005597AU_22263074_1

Claims

CLAIMS The claim(s) defining the invention are as follows:

1. A method of determining the bidirectional reflectance distribution function (BRDF) of an object using a marker placed on the object, the object having an unknown shape, the method comprising:

capturing, by an image capturing device, a plurality of images of the object, the object being illuminated by a light source at a fixed location, the image capturing device capturing the plurality of images from different positions in an un-calibrated manner;

aligning corresponding regions of the plurality of images using the marker;

determining a camera pose for each of the plurality of images based on the alignment of the corresponding regions of the plurality of images;

determining a shape of the object using a 3D alignment method based on the alignment of the corresponding regions of the plurality of images; and

estimating the BRDF of the object from the aligned regions and the determined shape of the object.

2. The method of claim 1, further comprising:

determining a three dimensional point cloud of the shape of the object;

parametrising the determined shape of the object to a two dimensional pixel grid;

creating a set of texture maps corresponding to the set of plurality of images based on the determined camera poses, the three dimensional point cloud, and the parametrised two dimensional pixel grid, wherein each texture map provides a mapping between each of the plurality of images and a texture map coordiante system; and

aligning the texture maps.

P0005597AU_22263074_1

3. The method of claim 1, wherein the alignment of corresponding regions of the plurality of images using the marker comprises:

determining an orientation of the marker in each of the plurality of images;

determining a homography of an ideal marker image based on the determined orientation of the marker;

aligning each of the plurality of images to the ideal marker image to generate a corresponding plurality of aligned images; and

determining feature points of the marker in each of the plurality of aligned images.

4. The method of claim 3, wherein the determination of feature points comprises:

determining the region in each of the plurality of aligned images; and

determining a position of the region in each of the plurality of aligned images using the marker.

5. The method of claim 1, wherein the determination of the shape of the object comprises:

performing surface fitting using the marker.

6. The method of claim 5, wherein the surface fitting comprises:

determining a physical geodesic distance between points on the marker;

determine a distance on an ideal marker corresponding to the determined physical geodesic distance; and

determining a typical distance ratio based on the determined physical geodesic distance and the determined distance.

7. The method of claim 2, wherein the alignment of the texture maps comprises: offsetting a thickness of the marker from the determined three dimensional point cloud of the shape of the object.

P0005597AU_22263074_1

8. The method of claim 2, wherein the alignment of the texture maps uses covariance based mutual information (CMI).

9. The method of claim 8, wherein the CMI is used for calculating optical flow fields between texture maps.

10. The method of claim 9, wherein the alignment of the texture maps further comprises minimising reprojection errors between the three dimensional point cloud and the plurality of images.

11. A device comprising:

a processor;

a memory in communication with the processor, wherein the memory comprises a computer application program that is executable by the processor, the computer application program comprising a method of determining the bidirectional reflectance distribution function (BRDF) of an object using a marker placed on the object, the object having an unknown shape, the method comprising:

aligning corresponding regions of the plurality of images using the marker;

12. The device of claim 11, wherein the method further comprises:

P0005597AU_22263074_1 determining a three dimensional point cloud of the shape of the object; parametrising the determined shape of the object to a two dimensional pixel grid; creating a set of texture maps corresponding to the set of plurality of images based on the determined camera poses, the three dimensional point cloud, and the parametrised two dimensional pixel grid, wherein each texture map provides a mapping between each of the plurality of images and a texture map coordiante system; and aligning the texture maps.

13. The device of claim 11, wherein the alignment of corresponding regions of the plurality of images using the marker comprises:

determining an orientation of the marker in each of the plurality of images;

14. The device of claim 13, wherein the determination of feature points comprises:

determining the region in each of the plurality of aligned images; and

15. The device of claim 11, wherein the determination of the shape of the object comprises:

performing surface fitting using the marker.

16. The device of claim 15, wherein the surface fitting comprises:

P0005597AU_22263074_1 determining a physical geodesic distance between points on the marker; determine a distance on an ideal marker corresponding to the determined physical geodesic distance; and determining a typical distance ratio based on the determined physical geodesic distance and the determined distance.

17. The device of claim 12, wherein the alignment of the texture maps comprises: offsetting a thickness of the marker from the determined three dimensional point cloud of the shape of the object.

18. The device of claim 12, wherein the alignment of the texture maps uses covariance based mutual information (CMI).

19. The device of claim 18, wherein the CMI is used for calculating optical flow fields between texture maps.

20. The device of claim 19, wherein the alignment of the texture maps further comprises minimising reprojection errors between the three dimensional point cloud and the plurality of images.

Canon Kabushiki Kaisha

Patent Attorneys for the Applicant

SPRUSON&FERGUSON

P0005597AU_22263074_1

1 / 19 15 Mar 2019

120

130 100 2019201822

110

140

Fig. 1

22263124_1

2 / 19 15 Mar 2019

200

Start 210 2019201822

Receive set of camera images 220

Apply structure from motion 230

Convert point cloud to mesh 240

Project camera view images onto mesh

250

Estimate BRDF

299

End

Fig. 2

22263124_1

3 / 19 15 Mar 2019

300 2019201822

Start 310

Capture BRDF

320

Render object and BRDF

End

Fig. 3

22263124_1

4 / 19 15 Mar 2019

310

Start 410 2019201822

Capture images 420

Align marker 430

Determine camera poses 440

Surface fitting 450

Parametrise surface 460

Project textures 465

Align textures 470

Estimate BRDF

End

Fig. 4 22263124_1

5 / 19 15 Mar 2019 2019201822

560 590 570 500

595

580

Fig. 5

22263124_1

6 / 19 15 Mar 2019

600

610 620 2019201822

650 660

Fig. 6 22263124_1

7 / 19 15 Mar 2019

700 2019201822

Start 710

Place marker 720

Set camera parameters 730

Set flash power 740

Capture image 750

YES More images?

NO

End

Fig. 7 22263124_1

8 / 19 15 Mar 2019 2019201822

500

820

Fig. 8

22263124_1

9 / 19 15 Mar 2019 2019201822

Start 900 910

Detect fiducial marks 920

Determine homography and approximate camera poses

930

Perform non-rigid alignment of marker pattern 940

Perform high accuracy feature point detection

End

Fig. 9 22263124_1

10 / 19 15 Mar 2019

1000

Start 1020 2019201822

Determine window size 1030

Extract window 1040

Choose a tile 1050

Correlate tile with window 1060

Detect peaks and offset

1070 Yes More tiles?

No 1080

Determine position from tile offsets 1099

End

Fig. 10 22263124_1

11 / 19 15 Mar 2019

1100 2019201822

Start 1110

Estimate point cloud scale 1120

Scale point cloud 1130

Adjust camera poses for new scale 1140

Surface fitting

End

Fig. 11 22263124_1

12 / 19 15 Mar 2019

1110 Start 2019201822

1210

Find nearest neighbours of high accuracy 3D points

1220

Measure distance along surface of a placed marker

1230

Measure corresponding distance on the ideal marker image

1240

Calculate typical distance ratio

End

Fig. 12 22263124_1

13 / 19 15 Mar 2019

450 2019201822

Start 452

Align surface to axes of the marker 454

Evaluate surface along regular grid

End

Fig. 13 22263124_1

14 / 19 15 Mar 2019

1400

1401 Start 1410 2019201822

1402 Point Cloud Project 3D point clouds Camera Poses

1403

Projection Warps

1404 1420

Images Warp images

1405

Projected Texture Maps

End

22263124_1 Fig. 14

15 / 19 15 Mar 2019

1500 1502 Start

1503 Camera Poses

1510 1504 Projection Warps 2019201822

Select reference 1505 Camera images map 1520 Projected Texture Maps Calculate reference map 1521 warps

Reference map warps 1530

Compose 1522

Reprojection Warps 1540

Warp captured images 1523

Registered Texture Maps

End

22263124_1 Fig. 15

16 / 19 15 Mar 2019

1600 1601 Start

1502 Point Cloud 1610

1522 Camera Poses 2019201822

More points? NO Reprojection YES 1504 Warps 1620 Camera images Select next point 1630

Calculate Reprojection error 1640

Minimize Reprojection 1650 Error

Replace point 1609

Corrected Point Cloud

End

22263124_1 Fig. 16

17 / 19 15 Mar 2019

1700

Start 2019201822

1710

Normalise textures 1720

Estimate light pose 1730

Estimate specular normals 1740

Estimate diffuse colour 1750

Estimate specular amplitude and roughness

End

Fig. 17 22263124_1

18 / 19 15 Mar 2019

(Wide-Area) Communications Network 1820 Printer 1815 2019201822

Microphone 1824 1880 1821 1817 (Local-Area) Video Communications Display Network 1822 Ext. 1823 1814 Modem 1816 1800

1601

Appl. Prog Storage Audio-Video I/O Interfaces Local Net. 1833 Devices Interface 1807 1808 I/face 1811 HDD 1810 1809

1804

1818 1819

Processor I/O Interface Memory Optical Disk 1805 1813 1806 Drive 1812

Keyboard 1802

Scanner 1826 Disk Storage 1803 Medium 1825 Camera 1827

Fig. 18A 22263124_1

19 / 19 15 Mar 2019

1834 1833

Instruction (Part 1) 1828 Data 1835 Instruction (Part 2) 1829 Data 1836 1832 1831

Instruction 1830 Data 1837 2019201822

ROM 1849 POST BIOS Bootstrap Operating 1850 1851 Loader 1852 System 1853

Input Variables 1854 Output Variables 1861

1855 1862

1856 1863

1857 1864

Intermediate Variables 1858 1859 1866 1860 1867

1819 1804

1818

1805 Interface 1842

1841 1848 Reg. 1844 (Instruction) Control Unit 1839 Reg. 1845

ALU 1840 Reg. 1846 (Data)

22263124_1 Fig. 18B