[go: up one dir, main page]

WO2017177363A1 - Methods and apparatuses for face hallucination - Google Patents

Methods and apparatuses for face hallucination Download PDF

Info

Publication number
WO2017177363A1
WO2017177363A1 PCT/CN2016/078960 CN2016078960W WO2017177363A1 WO 2017177363 A1 WO2017177363 A1 WO 2017177363A1 CN 2016078960 W CN2016078960 W CN 2016078960W WO 2017177363 A1 WO2017177363 A1 WO 2017177363A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
hallucination
trained model
dense
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2016/078960
Other languages
French (fr)
Inventor
Xiaoou Tang
Shizhan ZHU
Cheng Li
Chen Change Loy
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sensetime Group Ltd
Original Assignee
Sensetime Group Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sensetime Group Ltd filed Critical Sensetime Group Ltd
Priority to PCT/CN2016/078960 priority Critical patent/WO2017177363A1/en
Priority to CN201680084409.3A priority patent/CN109313795B/en
Publication of WO2017177363A1 publication Critical patent/WO2017177363A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/73Deblurring; Sharpening
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/20Image enhancement or restoration using local operators
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • G06V40/169Holistic features and representations, i.e. based on the facial image taken as a whole
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • G06T2207/30201Face

Definitions

  • the disclosure relates to image processing, in particular, to methods and apparatus for face hallucination.
  • Face hallucination is a task that improves the resolution of facial images and provides a viable means for improving low-resolution face processing and analysis, e.g., person identification in surveillance videos and facial image enhancement.
  • a method for face hallucination comprises: estimating a dense correspondence field based on a first image and a trained model; executing face hallucination based on the first image, the estimated dense correspondence field and the trained model through a bi-network to obtain a second image; and updating the first image with the second image, wherein the steps of estimating, executing and updating are performed repeatedly until the obtained second image has a desired resolution or the steps of estimating, executing and updating have been repeated for predetermined times.
  • an apparatus for face hallucination which comprises: an estimating unit configured to estimate a dense correspondence field based on a first image and a trained model; and a hallucination unit configured to execute face hallucination based on the first image, the estimated dense correspondence field and the trained model through a bi-network to obtain a second image; wherein the first image is iteratively updated with the second image, and the estimation unit and the hallucination unit works for a predetermined times of iterations or until the obtained second image has a desired resolution.
  • a device for face hallucination which comprises a processor and a memory storing computer-readable instructions, wherein, when the instructions are executed by the processor, the processor is operable to:estimate a dense correspondence field based on a first image and a trained model; execute face hallucination based on the first image, the estimated dense correspondence field and the trained model through a bi-network to obtain a second image; and update the first image with the second image, wherein the first image is iteratively updated with the second image for a predetermined times of iterations or until the obtained second image has a desired resolution.
  • a nonvolatile storage medium containing computer-readable instructions, wherein, when the instructions are executed by a processor, the processor is operable to estimate a dense correspondence field based on a first image and a trained model; execute face hallucination based on the first image, the estimated dense correspondence field and the trained model through a bi-network to obtain a second image; and update the first image with the second image, wherein the first image is iteratively updated with the second image for a predetermined times of iterations or until the obtained second image has a desired resolution.
  • Fig. 1 is a flow chart of a method for face hallucination according to an embodiment of the present disclosure.
  • Fig. 2 illustrates an apparatus for face hallucination according to an embodiment of the present disclosure.
  • Fig. 3 illustrates a flow chart of the training process of the estimation unit according to an embodiment of the present application.
  • Fig. 4 illustrates a flow chart of the testing process of the estimation unit according to an embodiment of the present application.
  • Fig. 5 illustrates a flow chart of the training process of the hallucination unit according to an embodiment of the present application.
  • Fig. 6 illustrates a flow chart of the testing process of the hallucination unit according to an embodiment of the present application.
  • Fig. 7 is a structural schematic diagram of an embodiment of computer equipment provided by the present invention.
  • a method for face hallucination is provided.
  • Fig. 1 is a flow chart of a method 100 for face hallucination according to an embodiment of the present disclosure.
  • an apparatus 200 for face hallucination is provided.
  • Fig. 1 is a flow chart of a method 100 for face hallucination according to an embodiment of the present disclosure.
  • Fig. 2 illustrates an apparatus 200 for face hallucination according to an embodiment of the present disclosure.
  • the apparatus 200 may comprises an estimation unit 201 and a hallucination unit 202.
  • a dense correspondence field is estimated by an estimation unit 201 based on an input first image 10 and parameters from a trained model 20.
  • the first image input into the estimation unit may be a facial image with a low resolution.
  • the dense correspondence field indicates the correspondence or mapping relationship of the first image to a warped image and denotes the warping of each pixel from the first image to the warped image.
  • the trained model contains various parameters that may be used for the estimation of the dense correspondence field.
  • step S102 face hallucination is executed by the hallucination unit 202 based on the first image 10 and the estimated dense correspondence field to obtain a second image 30.
  • the second image obtained after the face hallucination on the first image usually has a resolution higher than the first image.
  • the hallucination unit 202 is a bi-network which comprises a first branch 2021 being a common branch for face hallucination and a second branch 2022 being a high-frequency branch.
  • the processing in the common branch is similar to the face hallucination in the prior art.
  • the estimated dense correspondence field and parameters from the trained model 20 are further considered in addition to the input image 10.
  • the results obtained from both branches are incorporated through a gate network 2023 to obtain the second image 30.
  • the first image is updated with the second image so that the second image is used as an input to the estimation unit 201.
  • the steps S101 to S103 are performed repeatedly.
  • the steps may be performed repeatedly until the obtained second image has a desired image resolution.
  • the steps may be performed for pre-defined times.
  • the facial image may be denoted as a matrix I, and each pixel in the image may be denoted as x with coordinates (x, y) .
  • a mean face template for the facial image may be denoted as M, which comprises a plurality of pixels z.
  • the warping function W(z) may be determined based on a deformation coefficient p and a deformation base B(z) , which may be denoted as
  • the bases are pre-defined and shared by all samples.
  • the deformation base B (z) is predefined and shared by all samples, and thus the warping function is actually controlled by the deformation coefficient p for each sample.
  • f k is a Gauss-Newton descent regressor learned and stored in the trained model for predicting the dense correspondence field coefficients.
  • the coefficients f k may further be represented by a Gauss-Newton steepest descent regression matrixR k , which is obtained by training.
  • is the shape-indexed feature that concatenates the local appearance from all L landmarks, and is its average over all the training samples.
  • the dense correspondence field coefficients are estimated based on each pixel in the image.
  • the dense correspondence field coefficients are estimated based on landmarks in the image since using a sparse set of facial landmarks is more robust and accurate under low resolution.
  • a landmark base S k (l) is further considered in the estimation.
  • two sets of deformation bases i.e., the deformation base for the dense field and the landmark base for the landmarks are obtained, where l is the landmark index.
  • the bases for the dense field and landmarks are on-to-one related, i.e., both B k (z) and S k (l) are share the same deformation coefficients
  • the common branch conservatively recovers texture details that are only detectable from the low-resolution input, which is similar to the general super resolution.
  • the high-frequency branch super-resolves faces with the additional high-frequency prior warped by the estimated face correspondence field in the current cascade. Thanks to the guidance of prior, this branch is capable of recovering and synthesizing un-revealed texture details in the overly low-resolution input image.
  • a pixel-wise gate network is learned to fuse the results from the two branches.
  • the first image is upscaled and then input to the hallucination unit.
  • the upscaled image is input to both the common branch and the high-frequency branch.
  • the upscaled image is processed adaptively, for example, under a bicubic interpolation.
  • the estimated dense correspondence field is further input, and the upscaled image is processed based on the estimated dense correspondence field.
  • the results from both branches are combined in a gate network to obtain the second image.
  • the processing in the common branch is not limited to the bicubic interpolation, but may be any suitable process for the face hallucination.
  • the obtained image Ik is obtained by:
  • I k ⁇ I k-1 +g k ( ⁇ I k-1 ; W k (z) ) (4)
  • g k represents a hallucination bi-network learned and stored in the trained model for face hallucination.
  • the coefficients g k is obtained by training.
  • both the estimation unit and the hallucination unit may have a testing mode and a training mode.
  • the method 100 as shown in Fig. 1 illustrates the working process of the estimation unit and hallucination unit in the testing mode.
  • the estimation unit and hallucination unit may perform a training process to obtain and store parameters required in the testing mode into the trained model.
  • the estimation unit and the hallucination unit having both a testing mode and a training mode are described as an example.
  • the training process and the testing process may be performed by separate apparatus or separate units.
  • Fig. 3 illustrates a flow chart of the training process 300 of the estimation unit according to an embodiment of the present application. As shown, at step S301, the dense bases B k (z) , the landmark bases S k (l) and appearance eigen vectors ⁇ k are obtained.
  • the dense bases B k (z) and the landmark bases S k (l) are stored into the trained model for later use.
  • the average project-out Jacobian J k is learned, for example, by minimizing the following loss:
  • is the shape-indexed feature that concatenates the local appearance from all L landmarks, and is its average over all the training samples.
  • the Gauss-Newton steepest descent regression matrix R k is calculated by:
  • the process 300 may further include steps S304 and S305.
  • steps S304 and S305 the deformation coefficients for both the correspondence training set and the hallucination training set are updated.
  • step S305 the dense correspondence field for each location z for the hallucination training set is calculated. The deformation coefficients and the dense correspondence field obtained at steps S304 and S305 may be used in the later training process.
  • Fig. 4 illustrates a flow chart of the testing process 400 of the estimation unit according to an embodiment of the present application.
  • location for each landmark is obtained from the facial image input to the estimation unit.
  • the input image is the original low-resolution image in the first iteration.
  • the input image is the image obtained in the (k-1) th iteration, as well as the deformation coefficient obtained in the (k-1) th iteration.
  • the location of each landmark in the input image is obtained.
  • the SIFT feature from around the location of the landmark is obtained.
  • the SIFT feature is the shape-indexed feature described above.
  • the features from all the landmarks are combined as an appearance eigen vector.
  • the deformation coefficients are updated via regression according to the equation (2) .
  • the dense correspondence field for each location z is computed.
  • Fig. 5 illustrates a flow chart of the training process 500 of the hallucination unit according to an embodiment of the present application.
  • images from the training sets are upsampled by bicubic interpolation.
  • the warped high-frequency prior is obtained according to the dense correspondence field.
  • the deep bi-network is trained with three steps: pre-training the common sub-network, pre-training the high-frequency sub-network, and tuning the whole bi-network end-to-end.
  • the bi-network coefficient may be stored in the trained model.
  • the bi-network may be passed to compute the predict image for both the hallucination training set and the estimation training set.
  • Fig. 6 illustrates a flow chart of the testing process 600 of the hallucination unit according to an embodiment of the present application.
  • an input image I k-1 is upsampled by bicubic interpolation to obtain an upsampled image ⁇ I k-1 .
  • the warped high-frequency prior is obtained according to the dense correspondence field.
  • the learned bi-network coefficient g k is used to forward pass the deep bi-network with two inputs ⁇ I k-1 and so that the image I k is obtained.
  • Algorithm 1 is an exemplary training algorithm for learning the parameters by the apparatus according to an embodiment of the present application.
  • Algorithm 2 is an exemplary testing algorithm for hallucinating a low-resolution face according to an embodiment of the present application.
  • Fig. 7 is a structural schematic diagram of an embodiment of computer equipment provided by the present invention.
  • the computer equipment can be used for implementing the face hallucination method provided in the above embodiments.
  • the computer equipment may be greatly different due to different configuration or performance, and may include one or more processors (e.g. Central Processing Units, CPU) 710 and a memory 720.
  • the memory 720 may be a volatile memory or a nonvolatile memory.
  • One or more programs can be stored in the memory 720, and each program may include a series of instruction operations in the computer equipment.
  • the processor 710 can communicate with the memory 720, and execute the series of instruction operations in the memory 720 on the computer equipment.
  • data of one or more operating systems e.g.
  • the computer equipment may further include one or more power supplies 730, one or more wired or wireless network interfaces 740, one or more input/output interfaces 750, etc.
  • the method and the device according to the present invention described above may be implemented in hardware or firmware, or implemented as software or computer codes which can be stored in a recording medium (e.g. CD, ROM, RAM, soft disk, hard disk or magneto-optical disk) , or implemented as computer codes which are originally stored in a remote recording medium or a non-transient machine readable medium and can be downloaded through a network to be stored in a local recording medium, so that the method described herein can be processed by such software stored in the recording medium in a general purpose computer, a dedicated processor or programmable or dedicated hardware (e.g. ASIC or FPGA) .
  • the computer, the processor, the microprocessor controller or the programmable hardware include a storage assembly (e.g.
  • RAM random access memory
  • ROM read-only memory
  • flash memory etc.
  • the general purpose computer accesses the codes for implementing the processing shown herein, the execution of the codes converts the general purpose computer to a dedicated computer for executing the processing illustrated herein.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

Methods and apparatus for face hallucination are disclosed. According to an embodiment, a method for hallucination comprises estimating a dense correspondence field based on a first image and a trained model; executing face hallucination based on the first image, the estimated dense correspondence field and the trained model through a bi-network to obtain a second image; and updating the first image with the second image, wherein the steps of estimating, executing and updating are performed repeatedly until the obtained second image has a desired resolution or the steps of estimating, executing and updating have been repeated for predetermined times.

Description

Methods and Apparatuses for Face Hallucination Technical Field
The disclosure relates to image processing, in particular, to methods and apparatus for face hallucination.
Background
Increasing attention is devoted to detection of small facial images with low image resolution, for example, as low as 10 pixels of height. Meanwhile, facial analysis techniques, such as face alignment and verification, have been progressed rapidly. However, the performance of most existing techniques would degrade when a low-resolution facial image is given, because such image naturally carries less information, and images corrupted with down-sampling and blur would interfere the facial analysis procedure. Face hallucination is a task that improves the resolution of facial images and provides a viable means for improving low-resolution face processing and analysis, e.g., person identification in surveillance videos and facial image enhancement.
Summary
In one aspect of the present application, a method for face hallucination is provided, which comprises: estimating a dense correspondence field based on a first image and a trained model; executing face hallucination based on the first image, the estimated dense correspondence field and the trained model through a bi-network to obtain a second  image; and updating the first image with the second image, wherein the steps of estimating, executing and updating are performed repeatedly until the obtained second image has a desired resolution or the steps of estimating, executing and updating have been repeated for predetermined times.
According to another aspect of the present application, an apparatus for face hallucination is provided, which comprises: an estimating unit configured to estimate a dense correspondence field based on a first image and a trained model; and a hallucination unit configured to execute face hallucination based on the first image, the estimated dense correspondence field and the trained model through a bi-network to obtain a second image; wherein the first image is iteratively updated with the second image, and the estimation unit and the hallucination unit works for a predetermined times of iterations or until the obtained second image has a desired resolution.
In a further aspect of the present application, a device for face hallucination is provided, which comprises a processor and a memory storing computer-readable instructions, wherein, when the instructions are executed by the processor, the processor is operable to:estimate a dense correspondence field based on a first image and a trained model; execute face hallucination based on the first image, the estimated dense correspondence field and the trained model through a bi-network to obtain a second image; and update the first image with the second image, wherein the first image is iteratively updated with the second image for a predetermined times of iterations or until the obtained second  image has a desired resolution.
In a further aspect of the present application, a nonvolatile storage medium containing computer-readable instructions is provided, wherein, when the instructions are executed by a processor, the processor is operable to estimate a dense correspondence field based on a first image and a trained model; execute face hallucination based on the first image, the estimated dense correspondence field and the trained model through a bi-network to obtain a second image; and update the first image with the second image, wherein the first image is iteratively updated with the second image for a predetermined times of iterations or until the obtained second image has a desired resolution.
Brief Description of Drawings
Fig. 1 is a flow chart of a method for face hallucination according to an embodiment of the present disclosure.
Fig. 2 illustrates an apparatus for face hallucination according to an embodiment of the present disclosure.
Fig. 3 illustrates a flow chart of the training process of the estimation unit according to an embodiment of the present application.
Fig. 4 illustrates a flow chart of the testing process of the estimation unit according to an embodiment of the present application.
Fig. 5 illustrates a flow chart of the training process of the hallucination unit according to an embodiment of the present application.
Fig. 6 illustrates a flow chart of the testing process of the hallucination unit according to an embodiment of the present application.
Fig. 7 is a structural schematic diagram of an embodiment of computer equipment provided by the present invention.
Detailed Description of Embodiments
According to an embodiment, a method for face hallucination is provided. Fig. 1 is a flow chart of a method 100 for face hallucination according to an embodiment of the present disclosure. According to another embodiment, an apparatus 200 for face hallucination is provided. Fig. 1 is a flow chart of a method 100 for face hallucination according to an embodiment of the present disclosure. Fig. 2 illustrates an apparatus 200 for face hallucination according to an embodiment of the present disclosure. As shown in Fig. 2, the apparatus 200 may comprises an estimation unit 201 and a hallucination unit 202.
As shown in Fig. 1, at step S101, a dense correspondence field is estimated by an estimation unit 201 based on an input first image 10 and parameters from a trained model 20. The first image input into the estimation unit may be a facial image with a low resolution. The dense correspondence field indicates the correspondence or mapping relationship of the first image to a warped image and denotes the warping of each pixel from the first image to the warped image. The trained model contains various parameters that may be used for the estimation of the dense correspondence field.
At step S102, face hallucination is executed by the hallucination unit 202 based on the first image 10 and the estimated dense correspondence field to obtain a second image 30.The second image obtained after the face hallucination on the first image usually has a resolution higher than the first image. The hallucination unit 202 is a bi-network which comprises a first branch 2021 being a common branch for face hallucination and a second branch 2022 being a high-frequency branch. The processing in the common branch is similar to the face hallucination in the prior art. In the high-frequency branch, the estimated dense correspondence field and parameters from the trained model 20 are further considered in addition to the input image 10. The results obtained from both branches are incorporated through a gate network 2023 to obtain the second image 30.
At step S103, the first image is updated with the second image so that the second image is used as an input to the estimation unit 201. Then, the steps S101 to S103 are performed repeatedly. For example, the steps may be performed repeatedly until the obtained second image has a desired image resolution. Alternatively, the steps may be performed for pre-defined times.
For example, the facial image may be denoted as a matrix I, and each pixel in the image may be denoted as x with coordinates (x, y) . A mean face template for the facial image may be denoted as M, which comprises a plurality of pixels z. The dense correspondence field indicates the mapping from pixels z in the mean face template M to pixels x in the  facial image I, which may be denoted by a warping function W (z) as x=W (z) . It is noted that the pixels in both images are considered in a 2D face region. The warping function W(z) may be determined based on a deformation coefficient p and a deformation base B(z) , which may be denoted as
W (z) =z+B (z) p           (1)
where
Figure PCTCN2016078960-appb-000001
denotes the deformation coefficients and 
Figure PCTCN2016078960-appb-000002
denotes the deformation bases. The bases are pre-defined and shared by all samples.
According to an embodiment, the deformation base B (z) is predefined and shared by all samples, and thus the warping function is actually controlled by the deformation coefficient p for each sample. For an initially input image, p is equal to 0, and thus the warping function W (z) =z, indicating that the dense correspondence field is the mean face template.
Taking K times of iterations (k iterates from 1 to K) as an example, all the notations are appended with the index k to indicate the iteration. A larger k in the notation of Ik, Wk, Bk and Mk indicates the larger resolution and the same k indicates the same resolution. The whole work starts from I0 and p0, wherein I0 denotes the input low-resolution facial image and p0 is a zero vector representing the deformation coefficients of the mean face template. The final hallucinated facial image output is IK. The deformation coefficient pk, the warping function Wk (z) and the second image Ik are updated in each iteration.  For example, the deformation coefficient pk and the warping function Wk (z) are updated by:
Figure PCTCN2016078960-appb-000003
wherein fk is a Gauss-Newton descent regressor learned and stored in the trained model for predicting the dense correspondence field coefficients. The coefficients fk may further be represented by a Gauss-Newton steepest descent regression matrixRk, which is obtained by training. In the equation (2) φ is the shape-indexed feature that concatenates the local appearance from all L landmarks, and
Figure PCTCN2016078960-appb-000004
is its average over all the training samples.
In an embodiment, the dense correspondence field coefficients are estimated based on each pixel in the image. Alternatively, according to another embodiment, the dense correspondence field coefficients are estimated based on landmarks in the image since using a sparse set of facial landmarks is more robust and accurate under low resolution. Under such circumstances, a landmark base Sk (l) is further considered in the estimation. In particular, two sets of deformation bases, i.e., the deformation base
Figure PCTCN2016078960-appb-000005
for the dense field and the landmark base
Figure PCTCN2016078960-appb-000006
for the landmarks are obtained, where l is the landmark index. The bases for the dense field and landmarks are on-to-one related, i.e., both Bk (z) and Sk (l) are share the same deformation coefficients
Figure PCTCN2016078960-appb-000007
Wk (z) =z+Bk (z) pk
Figure PCTCN2016078960-appb-000008
where
Figure PCTCN2016078960-appb-000009
denotes the coordinates of the l-th landmark, and denotes its mean locations.
For the face hallucination, the common branch conservatively recovers texture details that are only detectable from the low-resolution input, which is similar to the general super resolution. The high-frequency branch super-resolves faces with the additional high-frequency prior warped by the estimated face correspondence field in the current cascade. Thanks to the guidance of prior, this branch is capable of recovering and synthesizing un-revealed texture details in the overly low-resolution input image. A pixel-wise gate network is learned to fuse the results from the two branches.
According to an embodiment, the first image is upscaled and then input to the hallucination unit. In particular, the upscaled image is input to both the common branch and the high-frequency branch. In the common branch, the upscaled image is processed adaptively, for example, under a bicubic interpolation. In the high-frequency branch, the estimated dense correspondence field is further input, and the upscaled image is processed based on the estimated dense correspondence field. The results from both branches are combined in a gate network to obtain the second image. It is noted that the processing in the common branch is not limited to the bicubic interpolation, but may be any suitable process for the face hallucination.
For example, for the k-th iteration, the obtained image Ik is obtained by:
Ik=↑Ik-1+gk (↑Ik-1; Wk (z) )    (4)
where gk represents a hallucination bi-network learned and stored in the trained model for face hallucination. The coefficients gk is obtained by training.
It is noted that both the estimation unit and the hallucination unit may have a testing mode and a training mode. The method 100 as shown in Fig. 1 illustrates the working process of the estimation unit and hallucination unit in the testing mode. When working in the training mode, the estimation unit and hallucination unit may perform a training process to obtain and store parameters required in the testing mode into the trained model. Herein, the estimation unit and the hallucination unit having both a testing mode and a training mode are described as an example. Alternatively, the training process and the testing process may be performed by separate apparatus or separate units.
In the training process, two training sets, i.e., a hallucination training set and a correspondence field training set are provided. Each of the two training sets includes a plurality of images, as well as the down-sampled images in various scales for each of the plurality of images. Ground-truth coefficients for the deformation coefficient p are further included in the correspondence field training set. In contrast with the testing process as described above, images input in the training process have high resolution.  Fig. 3 illustrates a flow chart of the training process 300 of the estimation unit according to an embodiment of the present application. As shown, at step S301, the dense bases Bk (z) , the landmark bases Sk (l) and appearance eigen vectors Φkare obtained. These parameters to be used in following steps are predefined. Meanwhile, the dense bases Bk (z) and the landmark bases Sk (l) are stored into the trained model for later use. At step 302, the average project-out Jacobian Jk is learned, for example, by minimizing the following loss:
Figure PCTCN2016078960-appb-000010
where
Figure PCTCN2016078960-appb-000011
φ is the shape-indexed feature that concatenates the local appearance from all L landmarks, and
Figure PCTCN2016078960-appb-000012
is its average over all the training samples.
At step S303, the Gauss-Newton steepest descent regression matrix Rk is calculated by:
Figure PCTCN2016078960-appb-000013
In the above equations, the Jacobian Jk and the Gauss-Newton steepest descent regression matrix Rk described above are obtained via constructing the project-out Hessian.
Optionally, the process 300 may further include steps S304 and S305. At S304, the deformation coefficients for both the correspondence training set and the hallucination training set are updated. At step S305, the dense correspondence field for each location  z for the hallucination training set is calculated. The deformation coefficients and the dense correspondence field obtained at steps S304 and S305 may be used in the later training process.
Fig. 4 illustrates a flow chart of the testing process 400 of the estimation unit according to an embodiment of the present application. As shown, at step S401, location for each landmark is obtained from the facial image input to the estimation unit. The input image is the original low-resolution image in the first iteration. In the following iteration (for example, in the k-th iteration) , the input image is the image obtained in the (k-1) th iteration, as well as the deformation coefficient obtained in the (k-1) th iteration. Based on the landmark base stored in the train model, the location of each landmark in the input image is obtained.
At step S402, for each landmark, the SIFT feature from around the location of the landmark is obtained. The SIFT feature is the shape-indexed feature described above. At step S403, the features from all the landmarks are combined as an appearance eigen vector. At step S404, the deformation coefficients are updated via regression according to the equation (2) . At step S405, the dense correspondence field for each location z is computed.
Fig. 5 illustrates a flow chart of the training process 500 of the hallucination unit according to an embodiment of the present application. As shown, at step S501, images  from the training sets are upsampled by bicubic interpolation. At step S502, the warped high-frequency prior
Figure PCTCN2016078960-appb-000014
is obtained according to the dense correspondence field. At step S503, the deep bi-network is trained with three steps: pre-training the common sub-network, pre-training the high-frequency sub-network, and tuning the whole bi-network end-to-end. In this step, the bi-network coefficient may be stored in the trained model. Then, at step S504, the bi-network may be passed to compute the predict image for both the hallucination training set and the estimation training set.
Fig. 6 illustrates a flow chart of the testing process 600 of the hallucination unit according to an embodiment of the present application. As shown, at step S601, an input image Ik-1 is upsampled by bicubic interpolation to obtain an upsampled image ↑ Ik-1. At step S602, the warped high-frequency prior
Figure PCTCN2016078960-appb-000015
is obtained according to the dense correspondence field. At step S603, the learned bi-network coefficient gk is used to forward pass the deep bi-network with two inputs ↑ Ik-1 and 
Figure PCTCN2016078960-appb-000016
so that the image Ik is obtained.
It is understanding from the above description that the two tasks, i.e., the high-level face correspondence estimation and the low-level face hallucination, are complementary and can be alternatingly refined with the guidance from each other through a task-alternating cascaded framework. Experiments have conducted and improved results are obtained.
An exemplary algorithms for training and testing according to the present application  are listed as below. Algorithm 1 is an exemplary training algorithm for learning the parameters by the apparatus according to an embodiment of the present application. Algorithm 2 is an exemplary testing algorithm for hallucinating a low-resolution face according to an embodiment of the present application.
Figure PCTCN2016078960-appb-000017
Figure PCTCN2016078960-appb-000018
Fig. 7 is a structural schematic diagram of an embodiment of computer equipment provided by the present invention.
With reference to Fig. 7, the computer equipment can be used for implementing the face hallucination method provided in the above embodiments. Specifically, the computer equipment may be greatly different due to different configuration or performance, and may include one or more processors (e.g. Central Processing Units, CPU) 710 and a memory 720. The memory 720 may be a volatile memory or a nonvolatile memory. One or more programs can be stored in the memory 720, and each program may include a series of instruction operations in the computer equipment. Further, the processor 710 can communicate with the memory 720, and execute the series of instruction operations in the memory 720 on the computer equipment. Particularly, data of one or more operating systems, e.g. Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM,  FreeBSDTM, etc., are further stored in the memory 720. The computer equipment may further include one or more power supplies 730, one or more wired or wireless network interfaces 740, one or more input/output interfaces 750, etc.
The method and the device according to the present invention described above may be implemented in hardware or firmware, or implemented as software or computer codes which can be stored in a recording medium (e.g. CD, ROM, RAM, soft disk, hard disk or magneto-optical disk) , or implemented as computer codes which are originally stored in a remote recording medium or a non-transient machine readable medium and can be downloaded through a network to be stored in a local recording medium, so that the method described herein can be processed by such software stored in the recording medium in a general purpose computer, a dedicated processor or programmable or dedicated hardware (e.g. ASIC or FPGA) . It could be understood that the computer, the processor, the microprocessor controller or the programmable hardware include a storage assembly (e.g. RAM, ROM, flash memory, etc. ) capable of storing or receiving software or computer codes, and when the software or computer codes are accessed and executed by the computer, the processor or the hardware, the processing method described herein is implemented. Moreover, when the general purpose computer accesses the codes for implementing the processing shown herein, the execution of the codes converts the general purpose computer to a dedicated computer for executing the processing illustrated herein.
The foregoing descriptions are merely specific embodiments of the present invention, but the protection scope of the present invention is not limited thereto. Any variation or substitution readily conceivable to those skilled in the art within the technical scope disclosed in the present invention shall fall within the protection scope of the present invention. Accordingly, the protection scope of the claims should prevail over the protection scope of the present invention.

Claims (16)

  1. A method for face hallucination, comprising:
    estimating a dense correspondence field based on a first image and a trained model;
    executing face hallucination based on the first image, the estimated dense correspondence field and the trained model through a bi-network to obtain a second image; and
    updating the first image with the second image,
    wherein the steps of estimating, executing and updating are performed repeatedly until the obtained second image has a desired resolution or the steps of estimating, executing and updating have been repeated for predetermined times.
  2. The method according to claim 1, wherein the bi-network comprises a first branch and a second branch, and the step of executing comprises:
    executing face hallucination based on the first image through the first branch to obtain a first result;
    executing face hallucination based on the first image, the estimated dense correspondence field and the trained model through the second branch to obtain a second result; and
    incorporating the first result and the second result to obtain the second image.
  3. The method of claim 1, wherein the trained model stores a dense base, a landmark base, a Gauss-Newton descent regressor for estimating the dense correspondence field,  and a bi-network coefficient for the face hallucination, wherein the dense base and the landmark base are predefined, and the Gauss-Newton descent regressor and the bi-network coefficient are learned by training.
  4. The method of claim 1, wherein the estimated dense correspondence field comprises a deformation coefficient p and a warping function W (z) for mapping a pixel z in a mean face image to a pixel x in the first image, wherein x=W (z) =z+Bp, and B is a predefined dense base.
  5. The method of claim 4, wherein for the repeated steps of estimating, executing and updating, the deformation coefficient p and the warping function W (z) are updated repeatedly.
  6. The method of claim 4, wherein for a k-th iteration, the first image is a (k-1) -th image Ik-1, the warping function is denoted as Wk (z) , and the second image is denoted as Ik and obtained by:
    Ik=↑ Ik-1+gk (↑ Ik-1; Wk (z) )
    wherein ↑ Ik-1 is an upscaled image of the (k-1) -th image Ik-1, and gk is a bi-network coefficient for the k-th iteration obtained from the trained model.
  7. The method of claim 4, wherein for a k-th iteration, the first image is a (k-1) -th image Ik-1, the deformation coefficient denoted as pk and the warping function denoted as Wk (z)  are obtained by:
    pk=pk-1+fk (Ik-1; pk-1)
    Wk (z) =z+Bkpk
    wherein pk-1 is the deformation coefficient obtained in the last iteration, fk is a Gauss-Newton descent regressor obtained from the trained model, and Bk is the predefined dense base for the k-th iteration obtained from the trained model.
  8. An apparatus for face hallucination, comprising:
    an estimating unit configured to estimate a dense correspondence field based on a first image and a trained model; and
    a hallucination unit configured to execute face hallucination based on the first image, the estimated dense correspondence field and the trained model through a bi-network to obtain a second image;
    wherein the first image is iteratively updated with the second image, and the estimation unit and the hallucination unit works for a predetermined times of iterations or until the obtained second image has a desired resolution.
  9. The apparatus of claim 8, wherein the hallucination unit comprises:
    a first branch configured to execute face hallucination based on the first image to obtain a first result;
    a second branch configured to execute face hallucination based on the first image, the estimated dense correspondence field and the trained model to obtain a second result;  and
    a gate network configured to incorporate the first result and the second result to obtain the second image.
  10. The apparatus of claim 8, wherein the trained model stores a dense base, a landmark base, a Gauss-Newton descent regressor for estimating the dense correspondence field, and a bi-network coefficient for the face hallucination, wherein the dense base and the landmark base are predefined, and the Gauss-Newton descent regressor and the bi-network coefficient are learned by training.
  11. The apparatus of claim 8, wherein estimated dense correspondence field comprises a deformation coefficient p and a warping function W (z) for mapping a pixel z in a mean face image to a pixel x in the first image, wherein x=W (z) =z+Bp, and B is a predefined dense base.
  12. The apparatus of claim 11, wherein for each time of the iterations, the deformation coefficient p and the warping function W (z) are updated repeatedly.
  13. The apparatus of claim 11, wherein for a k-th iteration, the first image is a (k-1) -th image Ik-1, the warping function is denoted as Wk (z) , and the second image obtained by the hallucination unit is denoted as Ik and obtained by:
    Ik=↑ Ik-1+gk (↑ Ik-1; Wk (z) )
    wherein ↑ Ik-1 is an upscaled image of the (k-1) -th image Ik-1, and gk is a bi-network coefficient for the k-th iteration obtained from the trained model.
  14. The apparatus of claim 11, wherein for a k-th iteration, the first image is a (k-1) -th image Ik-1, the deformation coefficient denoted as pk and the warping function denoted as Wk (z) are obtained by the estimation unit according to:
    pk=pk-1+fk (Ik-1; pk-1)
    Wk (z) =z+Bkpk
    wherein pk-1 is the deformation coefficient obtained in the last iteration, fk is a Gauss-Newton descent regressor obtained from the trained model, and Bk is the predefined dense base for the k-th iteration obtained from the trained model.
  15. A device for face hallucination, comprising:
    a processor; and
    a memory storing computer-readable instructions,
    wherein, when the instructions are executed by the processor, the processor is operable to:
    estimate a dense correspondence field based on a first image and a trained model;
    execute face hallucination based on the first image, the estimated dense correspondence field and the trained model through a bi-network to obtain a second image; and
    update the first image with the second image,
    wherein the first image is iteratively updated with the second image for a predetermined  times of iterations or until the obtained second image has a desired resolution.
  16. A nonvolatile storage medium containing computer-readable instructions, wherein, when the instructions are executed by a processor, the processor is operable to:
    estimate a dense correspondence field based on a first image and a trained model;
    execute face hallucination based on the first image, the estimated dense correspondence field and the trained model through a bi-network to obtain a second image; and
    update the first image with the second image,
    wherein the first image is iteratively updated with the second image for a predetermined times of iterations or until the obtained second image has a desired resolution.
PCT/CN2016/078960 2016-04-11 2016-04-11 Methods and apparatuses for face hallucination Ceased WO2017177363A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2016/078960 WO2017177363A1 (en) 2016-04-11 2016-04-11 Methods and apparatuses for face hallucination
CN201680084409.3A CN109313795B (en) 2016-04-11 2016-04-11 Method and apparatus for super-resolution processing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2016/078960 WO2017177363A1 (en) 2016-04-11 2016-04-11 Methods and apparatuses for face hallucination

Publications (1)

Publication Number Publication Date
WO2017177363A1 true WO2017177363A1 (en) 2017-10-19

Family

ID=60041336

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/078960 Ceased WO2017177363A1 (en) 2016-04-11 2016-04-11 Methods and apparatuses for face hallucination

Country Status (2)

Country Link
CN (1) CN109313795B (en)
WO (1) WO2017177363A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110008817A (en) * 2019-01-29 2019-07-12 北京奇艺世纪科技有限公司 Model training, image processing method, device, electronic equipment and computer readable storage medium

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112001861B (en) * 2020-08-18 2024-04-02 香港中文大学(深圳) Image processing method and device, computer equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070103595A1 (en) * 2005-10-27 2007-05-10 Yihong Gong Video super-resolution using personalized dictionary
US20110305404A1 (en) * 2010-06-14 2011-12-15 Chia-Wen Lin Method And System For Example-Based Face Hallucination
CN103208109A (en) * 2013-04-25 2013-07-17 武汉大学 Local restriction iteration neighborhood embedding-based face hallucination method
US20150363634A1 (en) * 2014-06-17 2015-12-17 Beijing Kuangshi Technology Co.,Ltd. Face Hallucination Using Convolutional Neural Networks

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103530863B (en) * 2013-10-30 2017-01-11 广东威创视讯科技股份有限公司 Multistage reconstruction image super resolution method
CN104091320B (en) * 2014-07-16 2017-03-29 武汉大学 Based on the noise face super-resolution reconstruction method that data-driven local feature is changed
CN105405113A (en) * 2015-10-23 2016-03-16 广州高清视信数码科技股份有限公司 Image super-resolution reconstruction method based on multi-task Gaussian process regression

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070103595A1 (en) * 2005-10-27 2007-05-10 Yihong Gong Video super-resolution using personalized dictionary
US20110305404A1 (en) * 2010-06-14 2011-12-15 Chia-Wen Lin Method And System For Example-Based Face Hallucination
CN103208109A (en) * 2013-04-25 2013-07-17 武汉大学 Local restriction iteration neighborhood embedding-based face hallucination method
US20150363634A1 (en) * 2014-06-17 2015-12-17 Beijing Kuangshi Technology Co.,Ltd. Face Hallucination Using Convolutional Neural Networks

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
DONG, CHAO ET AL., IMAGE SUPER-RESOLUTION USING DEEP CONVOLUTIONAL NETWORKS, vol. 2, no. 38, 1 June 2015 (2015-06-01), pages 1 - 6, XP011591233, ISSN: 0162-8828 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110008817A (en) * 2019-01-29 2019-07-12 北京奇艺世纪科技有限公司 Model training, image processing method, device, electronic equipment and computer readable storage medium
CN110008817B (en) * 2019-01-29 2021-12-28 北京奇艺世纪科技有限公司 Model training method, image processing method, device, electronic equipment and computer readable storage medium

Also Published As

Publication number Publication date
CN109313795B (en) 2022-03-29
CN109313795A (en) 2019-02-05

Similar Documents

Publication Publication Date Title
US11200696B2 (en) Method and apparatus for training 6D pose estimation network based on deep learning iterative matching
US11393092B2 (en) Motion tracking and strain determination
US12340575B2 (en) Vehicle detecting system, vehicle detecting method, and program storage medium
JP7030493B2 (en) Image processing equipment, image processing methods and programs
US11314989B2 (en) Training a generative model and a discriminative model
JP6539901B2 (en) PLANT DISEASE DIAGNOSTIC SYSTEM, PLANT DISEASE DIAGNOSTIC METHOD, AND PROGRAM
US20190279014A1 (en) Method and apparatus for detecting object keypoint, and electronic device
US9886746B2 (en) System and method for image inpainting
US10817984B2 (en) Image preprocessing method and device for JPEG compressed file
US11734837B2 (en) Systems and methods for motion estimation
CN105981041A (en) Facial Keypoint Localization Using Coarse-to-Fine Cascaded Neural Networks
CN111507906A (en) Method and apparatus for de-jittering with a neural network for fault tolerance and fluctuation robustness
US11449975B2 (en) Object count estimation apparatus, object count estimation method, and computer program product
CN109685805B (en) An image segmentation method and device
CN113167568B (en) Coordinate calculation device, coordinate calculation method, and computer-readable recording medium
KR101700030B1 (en) Method for visual object localization using privileged information and apparatus for performing the same
WO2017177363A1 (en) Methods and apparatuses for face hallucination
CN109784353B (en) Processor-implemented method, device, and storage medium
CN114170087A (en) Cross-scale low-rank constraint-based image blind super-resolution method
US20230343438A1 (en) Systems and methods for automatic image annotation
WO2021075465A1 (en) Device, method, and program for three-dimensional reconstruction of subject to be analyzed
JPWO2011033744A1 (en) Image processing apparatus, image processing method, and image processing program
US20240338834A1 (en) Method and system for estimating temporally consistent 3d human shape and motion from monocular video
EP4343680A1 (en) De-noising data
Khosravi et al. A new statistical technique for interpolation of landsat images

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16898182

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 16898182

Country of ref document: EP

Kind code of ref document: A1