WO2017177363A1 - Methods and apparatuses for face hallucination - Google Patents
Methods and apparatuses for face hallucination Download PDFInfo
- Publication number
- WO2017177363A1 WO2017177363A1 PCT/CN2016/078960 CN2016078960W WO2017177363A1 WO 2017177363 A1 WO2017177363 A1 WO 2017177363A1 CN 2016078960 W CN2016078960 W CN 2016078960W WO 2017177363 A1 WO2017177363 A1 WO 2017177363A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- image
- hallucination
- trained model
- dense
- network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/73—Deblurring; Sharpening
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/20—Image enhancement or restoration using local operators
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/168—Feature extraction; Face representation
- G06V40/169—Holistic features and representations, i.e. based on the facial image taken as a whole
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30196—Human being; Person
- G06T2207/30201—Face
Definitions
- the disclosure relates to image processing, in particular, to methods and apparatus for face hallucination.
- Face hallucination is a task that improves the resolution of facial images and provides a viable means for improving low-resolution face processing and analysis, e.g., person identification in surveillance videos and facial image enhancement.
- a method for face hallucination comprises: estimating a dense correspondence field based on a first image and a trained model; executing face hallucination based on the first image, the estimated dense correspondence field and the trained model through a bi-network to obtain a second image; and updating the first image with the second image, wherein the steps of estimating, executing and updating are performed repeatedly until the obtained second image has a desired resolution or the steps of estimating, executing and updating have been repeated for predetermined times.
- an apparatus for face hallucination which comprises: an estimating unit configured to estimate a dense correspondence field based on a first image and a trained model; and a hallucination unit configured to execute face hallucination based on the first image, the estimated dense correspondence field and the trained model through a bi-network to obtain a second image; wherein the first image is iteratively updated with the second image, and the estimation unit and the hallucination unit works for a predetermined times of iterations or until the obtained second image has a desired resolution.
- a device for face hallucination which comprises a processor and a memory storing computer-readable instructions, wherein, when the instructions are executed by the processor, the processor is operable to:estimate a dense correspondence field based on a first image and a trained model; execute face hallucination based on the first image, the estimated dense correspondence field and the trained model through a bi-network to obtain a second image; and update the first image with the second image, wherein the first image is iteratively updated with the second image for a predetermined times of iterations or until the obtained second image has a desired resolution.
- a nonvolatile storage medium containing computer-readable instructions, wherein, when the instructions are executed by a processor, the processor is operable to estimate a dense correspondence field based on a first image and a trained model; execute face hallucination based on the first image, the estimated dense correspondence field and the trained model through a bi-network to obtain a second image; and update the first image with the second image, wherein the first image is iteratively updated with the second image for a predetermined times of iterations or until the obtained second image has a desired resolution.
- Fig. 1 is a flow chart of a method for face hallucination according to an embodiment of the present disclosure.
- Fig. 2 illustrates an apparatus for face hallucination according to an embodiment of the present disclosure.
- Fig. 3 illustrates a flow chart of the training process of the estimation unit according to an embodiment of the present application.
- Fig. 4 illustrates a flow chart of the testing process of the estimation unit according to an embodiment of the present application.
- Fig. 5 illustrates a flow chart of the training process of the hallucination unit according to an embodiment of the present application.
- Fig. 6 illustrates a flow chart of the testing process of the hallucination unit according to an embodiment of the present application.
- Fig. 7 is a structural schematic diagram of an embodiment of computer equipment provided by the present invention.
- a method for face hallucination is provided.
- Fig. 1 is a flow chart of a method 100 for face hallucination according to an embodiment of the present disclosure.
- an apparatus 200 for face hallucination is provided.
- Fig. 1 is a flow chart of a method 100 for face hallucination according to an embodiment of the present disclosure.
- Fig. 2 illustrates an apparatus 200 for face hallucination according to an embodiment of the present disclosure.
- the apparatus 200 may comprises an estimation unit 201 and a hallucination unit 202.
- a dense correspondence field is estimated by an estimation unit 201 based on an input first image 10 and parameters from a trained model 20.
- the first image input into the estimation unit may be a facial image with a low resolution.
- the dense correspondence field indicates the correspondence or mapping relationship of the first image to a warped image and denotes the warping of each pixel from the first image to the warped image.
- the trained model contains various parameters that may be used for the estimation of the dense correspondence field.
- step S102 face hallucination is executed by the hallucination unit 202 based on the first image 10 and the estimated dense correspondence field to obtain a second image 30.
- the second image obtained after the face hallucination on the first image usually has a resolution higher than the first image.
- the hallucination unit 202 is a bi-network which comprises a first branch 2021 being a common branch for face hallucination and a second branch 2022 being a high-frequency branch.
- the processing in the common branch is similar to the face hallucination in the prior art.
- the estimated dense correspondence field and parameters from the trained model 20 are further considered in addition to the input image 10.
- the results obtained from both branches are incorporated through a gate network 2023 to obtain the second image 30.
- the first image is updated with the second image so that the second image is used as an input to the estimation unit 201.
- the steps S101 to S103 are performed repeatedly.
- the steps may be performed repeatedly until the obtained second image has a desired image resolution.
- the steps may be performed for pre-defined times.
- the facial image may be denoted as a matrix I, and each pixel in the image may be denoted as x with coordinates (x, y) .
- a mean face template for the facial image may be denoted as M, which comprises a plurality of pixels z.
- the warping function W(z) may be determined based on a deformation coefficient p and a deformation base B(z) , which may be denoted as
- the bases are pre-defined and shared by all samples.
- the deformation base B (z) is predefined and shared by all samples, and thus the warping function is actually controlled by the deformation coefficient p for each sample.
- f k is a Gauss-Newton descent regressor learned and stored in the trained model for predicting the dense correspondence field coefficients.
- the coefficients f k may further be represented by a Gauss-Newton steepest descent regression matrixR k , which is obtained by training.
- ⁇ is the shape-indexed feature that concatenates the local appearance from all L landmarks, and is its average over all the training samples.
- the dense correspondence field coefficients are estimated based on each pixel in the image.
- the dense correspondence field coefficients are estimated based on landmarks in the image since using a sparse set of facial landmarks is more robust and accurate under low resolution.
- a landmark base S k (l) is further considered in the estimation.
- two sets of deformation bases i.e., the deformation base for the dense field and the landmark base for the landmarks are obtained, where l is the landmark index.
- the bases for the dense field and landmarks are on-to-one related, i.e., both B k (z) and S k (l) are share the same deformation coefficients
- the common branch conservatively recovers texture details that are only detectable from the low-resolution input, which is similar to the general super resolution.
- the high-frequency branch super-resolves faces with the additional high-frequency prior warped by the estimated face correspondence field in the current cascade. Thanks to the guidance of prior, this branch is capable of recovering and synthesizing un-revealed texture details in the overly low-resolution input image.
- a pixel-wise gate network is learned to fuse the results from the two branches.
- the first image is upscaled and then input to the hallucination unit.
- the upscaled image is input to both the common branch and the high-frequency branch.
- the upscaled image is processed adaptively, for example, under a bicubic interpolation.
- the estimated dense correspondence field is further input, and the upscaled image is processed based on the estimated dense correspondence field.
- the results from both branches are combined in a gate network to obtain the second image.
- the processing in the common branch is not limited to the bicubic interpolation, but may be any suitable process for the face hallucination.
- the obtained image Ik is obtained by:
- I k ⁇ I k-1 +g k ( ⁇ I k-1 ; W k (z) ) (4)
- g k represents a hallucination bi-network learned and stored in the trained model for face hallucination.
- the coefficients g k is obtained by training.
- both the estimation unit and the hallucination unit may have a testing mode and a training mode.
- the method 100 as shown in Fig. 1 illustrates the working process of the estimation unit and hallucination unit in the testing mode.
- the estimation unit and hallucination unit may perform a training process to obtain and store parameters required in the testing mode into the trained model.
- the estimation unit and the hallucination unit having both a testing mode and a training mode are described as an example.
- the training process and the testing process may be performed by separate apparatus or separate units.
- Fig. 3 illustrates a flow chart of the training process 300 of the estimation unit according to an embodiment of the present application. As shown, at step S301, the dense bases B k (z) , the landmark bases S k (l) and appearance eigen vectors ⁇ k are obtained.
- the dense bases B k (z) and the landmark bases S k (l) are stored into the trained model for later use.
- the average project-out Jacobian J k is learned, for example, by minimizing the following loss:
- ⁇ is the shape-indexed feature that concatenates the local appearance from all L landmarks, and is its average over all the training samples.
- the Gauss-Newton steepest descent regression matrix R k is calculated by:
- the process 300 may further include steps S304 and S305.
- steps S304 and S305 the deformation coefficients for both the correspondence training set and the hallucination training set are updated.
- step S305 the dense correspondence field for each location z for the hallucination training set is calculated. The deformation coefficients and the dense correspondence field obtained at steps S304 and S305 may be used in the later training process.
- Fig. 4 illustrates a flow chart of the testing process 400 of the estimation unit according to an embodiment of the present application.
- location for each landmark is obtained from the facial image input to the estimation unit.
- the input image is the original low-resolution image in the first iteration.
- the input image is the image obtained in the (k-1) th iteration, as well as the deformation coefficient obtained in the (k-1) th iteration.
- the location of each landmark in the input image is obtained.
- the SIFT feature from around the location of the landmark is obtained.
- the SIFT feature is the shape-indexed feature described above.
- the features from all the landmarks are combined as an appearance eigen vector.
- the deformation coefficients are updated via regression according to the equation (2) .
- the dense correspondence field for each location z is computed.
- Fig. 5 illustrates a flow chart of the training process 500 of the hallucination unit according to an embodiment of the present application.
- images from the training sets are upsampled by bicubic interpolation.
- the warped high-frequency prior is obtained according to the dense correspondence field.
- the deep bi-network is trained with three steps: pre-training the common sub-network, pre-training the high-frequency sub-network, and tuning the whole bi-network end-to-end.
- the bi-network coefficient may be stored in the trained model.
- the bi-network may be passed to compute the predict image for both the hallucination training set and the estimation training set.
- Fig. 6 illustrates a flow chart of the testing process 600 of the hallucination unit according to an embodiment of the present application.
- an input image I k-1 is upsampled by bicubic interpolation to obtain an upsampled image ⁇ I k-1 .
- the warped high-frequency prior is obtained according to the dense correspondence field.
- the learned bi-network coefficient g k is used to forward pass the deep bi-network with two inputs ⁇ I k-1 and so that the image I k is obtained.
- Algorithm 1 is an exemplary training algorithm for learning the parameters by the apparatus according to an embodiment of the present application.
- Algorithm 2 is an exemplary testing algorithm for hallucinating a low-resolution face according to an embodiment of the present application.
- Fig. 7 is a structural schematic diagram of an embodiment of computer equipment provided by the present invention.
- the computer equipment can be used for implementing the face hallucination method provided in the above embodiments.
- the computer equipment may be greatly different due to different configuration or performance, and may include one or more processors (e.g. Central Processing Units, CPU) 710 and a memory 720.
- the memory 720 may be a volatile memory or a nonvolatile memory.
- One or more programs can be stored in the memory 720, and each program may include a series of instruction operations in the computer equipment.
- the processor 710 can communicate with the memory 720, and execute the series of instruction operations in the memory 720 on the computer equipment.
- data of one or more operating systems e.g.
- the computer equipment may further include one or more power supplies 730, one or more wired or wireless network interfaces 740, one or more input/output interfaces 750, etc.
- the method and the device according to the present invention described above may be implemented in hardware or firmware, or implemented as software or computer codes which can be stored in a recording medium (e.g. CD, ROM, RAM, soft disk, hard disk or magneto-optical disk) , or implemented as computer codes which are originally stored in a remote recording medium or a non-transient machine readable medium and can be downloaded through a network to be stored in a local recording medium, so that the method described herein can be processed by such software stored in the recording medium in a general purpose computer, a dedicated processor or programmable or dedicated hardware (e.g. ASIC or FPGA) .
- the computer, the processor, the microprocessor controller or the programmable hardware include a storage assembly (e.g.
- RAM random access memory
- ROM read-only memory
- flash memory etc.
- the general purpose computer accesses the codes for implementing the processing shown herein, the execution of the codes converts the general purpose computer to a dedicated computer for executing the processing illustrated herein.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Health & Medical Sciences (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
Methods and apparatus for face hallucination are disclosed. According to an embodiment, a method for hallucination comprises estimating a dense correspondence field based on a first image and a trained model; executing face hallucination based on the first image, the estimated dense correspondence field and the trained model through a bi-network to obtain a second image; and updating the first image with the second image, wherein the steps of estimating, executing and updating are performed repeatedly until the obtained second image has a desired resolution or the steps of estimating, executing and updating have been repeated for predetermined times.
Description
The disclosure relates to image processing, in particular, to methods and apparatus for face hallucination.
Increasing attention is devoted to detection of small facial images with low image resolution, for example, as low as 10 pixels of height. Meanwhile, facial analysis techniques, such as face alignment and verification, have been progressed rapidly. However, the performance of most existing techniques would degrade when a low-resolution facial image is given, because such image naturally carries less information, and images corrupted with down-sampling and blur would interfere the facial analysis procedure. Face hallucination is a task that improves the resolution of facial images and provides a viable means for improving low-resolution face processing and analysis, e.g., person identification in surveillance videos and facial image enhancement.
Summary
In one aspect of the present application, a method for face hallucination is provided, which comprises: estimating a dense correspondence field based on a first image and a trained model; executing face hallucination based on the first image, the estimated dense correspondence field and the trained model through a bi-network to obtain a second
image; and updating the first image with the second image, wherein the steps of estimating, executing and updating are performed repeatedly until the obtained second image has a desired resolution or the steps of estimating, executing and updating have been repeated for predetermined times.
According to another aspect of the present application, an apparatus for face hallucination is provided, which comprises: an estimating unit configured to estimate a dense correspondence field based on a first image and a trained model; and a hallucination unit configured to execute face hallucination based on the first image, the estimated dense correspondence field and the trained model through a bi-network to obtain a second image; wherein the first image is iteratively updated with the second image, and the estimation unit and the hallucination unit works for a predetermined times of iterations or until the obtained second image has a desired resolution.
In a further aspect of the present application, a device for face hallucination is provided, which comprises a processor and a memory storing computer-readable instructions, wherein, when the instructions are executed by the processor, the processor is operable to:estimate a dense correspondence field based on a first image and a trained model; execute face hallucination based on the first image, the estimated dense correspondence field and the trained model through a bi-network to obtain a second image; and update the first image with the second image, wherein the first image is iteratively updated with the second image for a predetermined times of iterations or until the obtained second
image has a desired resolution.
In a further aspect of the present application, a nonvolatile storage medium containing computer-readable instructions is provided, wherein, when the instructions are executed by a processor, the processor is operable to estimate a dense correspondence field based on a first image and a trained model; execute face hallucination based on the first image, the estimated dense correspondence field and the trained model through a bi-network to obtain a second image; and update the first image with the second image, wherein the first image is iteratively updated with the second image for a predetermined times of iterations or until the obtained second image has a desired resolution.
Brief Description of Drawings
Fig. 1 is a flow chart of a method for face hallucination according to an embodiment of the present disclosure.
Fig. 2 illustrates an apparatus for face hallucination according to an embodiment of the present disclosure.
Fig. 3 illustrates a flow chart of the training process of the estimation unit according to an embodiment of the present application.
Fig. 4 illustrates a flow chart of the testing process of the estimation unit according to an embodiment of the present application.
Fig. 5 illustrates a flow chart of the training process of the hallucination unit according to an embodiment of the present application.
Fig. 6 illustrates a flow chart of the testing process of the hallucination unit according to an embodiment of the present application.
Fig. 7 is a structural schematic diagram of an embodiment of computer equipment provided by the present invention.
Detailed Description of Embodiments
According to an embodiment, a method for face hallucination is provided. Fig. 1 is a flow chart of a method 100 for face hallucination according to an embodiment of the present disclosure. According to another embodiment, an apparatus 200 for face hallucination is provided. Fig. 1 is a flow chart of a method 100 for face hallucination according to an embodiment of the present disclosure. Fig. 2 illustrates an apparatus 200 for face hallucination according to an embodiment of the present disclosure. As shown in Fig. 2, the apparatus 200 may comprises an estimation unit 201 and a hallucination unit 202.
As shown in Fig. 1, at step S101, a dense correspondence field is estimated by an estimation unit 201 based on an input first image 10 and parameters from a trained model 20. The first image input into the estimation unit may be a facial image with a low resolution. The dense correspondence field indicates the correspondence or mapping relationship of the first image to a warped image and denotes the warping of each pixel from the first image to the warped image. The trained model contains various parameters that may be used for the estimation of the dense correspondence field.
At step S102, face hallucination is executed by the hallucination unit 202 based on the first image 10 and the estimated dense correspondence field to obtain a second image 30.The second image obtained after the face hallucination on the first image usually has a resolution higher than the first image. The hallucination unit 202 is a bi-network which comprises a first branch 2021 being a common branch for face hallucination and a second branch 2022 being a high-frequency branch. The processing in the common branch is similar to the face hallucination in the prior art. In the high-frequency branch, the estimated dense correspondence field and parameters from the trained model 20 are further considered in addition to the input image 10. The results obtained from both branches are incorporated through a gate network 2023 to obtain the second image 30.
At step S103, the first image is updated with the second image so that the second image is used as an input to the estimation unit 201. Then, the steps S101 to S103 are performed repeatedly. For example, the steps may be performed repeatedly until the obtained second image has a desired image resolution. Alternatively, the steps may be performed for pre-defined times.
For example, the facial image may be denoted as a matrix I, and each pixel in the image may be denoted as x with coordinates (x, y) . A mean face template for the facial image may be denoted as M, which comprises a plurality of pixels z. The dense correspondence field indicates the mapping from pixels z in the mean face template M to pixels x in the
facial image I, which may be denoted by a warping function W (z) as x=W (z) . It is noted that the pixels in both images are considered in a 2D face region. The warping function W(z) may be determined based on a deformation coefficient p and a deformation base B(z) , which may be denoted as
W (z) =z+B (z) p (1)
wheredenotes the deformation coefficients and denotes the deformation bases. The bases are pre-defined and shared by all samples.
According to an embodiment, the deformation base B (z) is predefined and shared by all samples, and thus the warping function is actually controlled by the deformation coefficient p for each sample. For an initially input image, p is equal to 0, and thus the warping function W (z) =z, indicating that the dense correspondence field is the mean face template.
Taking K times of iterations (k iterates from 1 to K) as an example, all the notations are appended with the index k to indicate the iteration. A larger k in the notation of Ik, Wk, Bk and Mk indicates the larger resolution and the same k indicates the same resolution. The whole work starts from I0 and p0, wherein I0 denotes the input low-resolution facial image and p0 is a zero vector representing the deformation coefficients of the mean face template. The final hallucinated facial image output is IK. The deformation coefficient pk, the warping function Wk (z) and the second image Ik are updated in each iteration.
For example, the deformation coefficient pk and the warping function Wk (z) are updated by:
wherein fk is a Gauss-Newton descent regressor learned and stored in the trained model for predicting the dense correspondence field coefficients. The coefficients fk may further be represented by a Gauss-Newton steepest descent regression matrixRk, which is obtained by training. In the equation (2) φ is the shape-indexed feature that concatenates the local appearance from all L landmarks, andis its average over all the training samples.
In an embodiment, the dense correspondence field coefficients are estimated based on each pixel in the image. Alternatively, according to another embodiment, the dense correspondence field coefficients are estimated based on landmarks in the image since using a sparse set of facial landmarks is more robust and accurate under low resolution. Under such circumstances, a landmark base Sk (l) is further considered in the estimation. In particular, two sets of deformation bases, i.e., the deformation basefor the dense field and the landmark basefor the landmarks are obtained, where l is the landmark index. The bases for the dense field and landmarks are on-to-one related, i.e., both Bk (z) and Sk (l) are share the same deformation coefficients
Wk (z) =z+Bk (z) pk
For the face hallucination, the common branch conservatively recovers texture details that are only detectable from the low-resolution input, which is similar to the general super resolution. The high-frequency branch super-resolves faces with the additional high-frequency prior warped by the estimated face correspondence field in the current cascade. Thanks to the guidance of prior, this branch is capable of recovering and synthesizing un-revealed texture details in the overly low-resolution input image. A pixel-wise gate network is learned to fuse the results from the two branches.
According to an embodiment, the first image is upscaled and then input to the hallucination unit. In particular, the upscaled image is input to both the common branch and the high-frequency branch. In the common branch, the upscaled image is processed adaptively, for example, under a bicubic interpolation. In the high-frequency branch, the estimated dense correspondence field is further input, and the upscaled image is processed based on the estimated dense correspondence field. The results from both branches are combined in a gate network to obtain the second image. It is noted that the processing in the common branch is not limited to the bicubic interpolation, but may be any suitable process for the face hallucination.
For example, for the k-th iteration, the obtained image Ik is obtained by:
Ik=↑Ik-1+gk (↑Ik-1; Wk (z) ) (4)
where gk represents a hallucination bi-network learned and stored in the trained model for face hallucination. The coefficients gk is obtained by training.
It is noted that both the estimation unit and the hallucination unit may have a testing mode and a training mode. The method 100 as shown in Fig. 1 illustrates the working process of the estimation unit and hallucination unit in the testing mode. When working in the training mode, the estimation unit and hallucination unit may perform a training process to obtain and store parameters required in the testing mode into the trained model. Herein, the estimation unit and the hallucination unit having both a testing mode and a training mode are described as an example. Alternatively, the training process and the testing process may be performed by separate apparatus or separate units.
In the training process, two training sets, i.e., a hallucination training set and a correspondence field training set are provided. Each of the two training sets includes a plurality of images, as well as the down-sampled images in various scales for each of the plurality of images. Ground-truth coefficients for the deformation coefficient p are further included in the correspondence field training set. In contrast with the testing process as described above, images input in the training process have high resolution.
Fig. 3 illustrates a flow chart of the training process 300 of the estimation unit according to an embodiment of the present application. As shown, at step S301, the dense bases Bk (z) , the landmark bases Sk (l) and appearance eigen vectors Φkare obtained. These parameters to be used in following steps are predefined. Meanwhile, the dense bases Bk (z) and the landmark bases Sk (l) are stored into the trained model for later use. At step 302, the average project-out Jacobian Jk is learned, for example, by minimizing the following loss:
whereφ is the shape-indexed feature that concatenates the local appearance from all L landmarks, andis its average over all the training samples.
At step S303, the Gauss-Newton steepest descent regression matrix Rk is calculated by:
In the above equations, the Jacobian Jk and the Gauss-Newton steepest descent regression matrix Rk described above are obtained via constructing the project-out Hessian.
Optionally, the process 300 may further include steps S304 and S305. At S304, the deformation coefficients for both the correspondence training set and the hallucination training set are updated. At step S305, the dense correspondence field for each location
z for the hallucination training set is calculated. The deformation coefficients and the dense correspondence field obtained at steps S304 and S305 may be used in the later training process.
Fig. 4 illustrates a flow chart of the testing process 400 of the estimation unit according to an embodiment of the present application. As shown, at step S401, location for each landmark is obtained from the facial image input to the estimation unit. The input image is the original low-resolution image in the first iteration. In the following iteration (for example, in the k-th iteration) , the input image is the image obtained in the (k-1) th iteration, as well as the deformation coefficient obtained in the (k-1) th iteration. Based on the landmark base stored in the train model, the location of each landmark in the input image is obtained.
At step S402, for each landmark, the SIFT feature from around the location of the landmark is obtained. The SIFT feature is the shape-indexed feature described above. At step S403, the features from all the landmarks are combined as an appearance eigen vector. At step S404, the deformation coefficients are updated via regression according to the equation (2) . At step S405, the dense correspondence field for each location z is computed.
Fig. 5 illustrates a flow chart of the training process 500 of the hallucination unit according to an embodiment of the present application. As shown, at step S501, images
from the training sets are upsampled by bicubic interpolation. At step S502, the warped high-frequency prioris obtained according to the dense correspondence field. At step S503, the deep bi-network is trained with three steps: pre-training the common sub-network, pre-training the high-frequency sub-network, and tuning the whole bi-network end-to-end. In this step, the bi-network coefficient may be stored in the trained model. Then, at step S504, the bi-network may be passed to compute the predict image for both the hallucination training set and the estimation training set.
Fig. 6 illustrates a flow chart of the testing process 600 of the hallucination unit according to an embodiment of the present application. As shown, at step S601, an input image Ik-1 is upsampled by bicubic interpolation to obtain an upsampled image ↑ Ik-1. At step S602, the warped high-frequency prioris obtained according to the dense correspondence field. At step S603, the learned bi-network coefficient gk is used to forward pass the deep bi-network with two inputs ↑ Ik-1 and so that the image Ik is obtained.
It is understanding from the above description that the two tasks, i.e., the high-level face correspondence estimation and the low-level face hallucination, are complementary and can be alternatingly refined with the guidance from each other through a task-alternating cascaded framework. Experiments have conducted and improved results are obtained.
An exemplary algorithms for training and testing according to the present application
are listed as below. Algorithm 1 is an exemplary training algorithm for learning the parameters by the apparatus according to an embodiment of the present application. Algorithm 2 is an exemplary testing algorithm for hallucinating a low-resolution face according to an embodiment of the present application.
Fig. 7 is a structural schematic diagram of an embodiment of computer equipment provided by the present invention.
With reference to Fig. 7, the computer equipment can be used for implementing the face hallucination method provided in the above embodiments. Specifically, the computer equipment may be greatly different due to different configuration or performance, and may include one or more processors (e.g. Central Processing Units, CPU) 710 and a memory 720. The memory 720 may be a volatile memory or a nonvolatile memory. One or more programs can be stored in the memory 720, and each program may include a series of instruction operations in the computer equipment. Further, the processor 710 can communicate with the memory 720, and execute the series of instruction operations in the memory 720 on the computer equipment. Particularly, data of one or more operating systems, e.g. Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM,
FreeBSDTM, etc., are further stored in the memory 720. The computer equipment may further include one or more power supplies 730, one or more wired or wireless network interfaces 740, one or more input/output interfaces 750, etc.
The method and the device according to the present invention described above may be implemented in hardware or firmware, or implemented as software or computer codes which can be stored in a recording medium (e.g. CD, ROM, RAM, soft disk, hard disk or magneto-optical disk) , or implemented as computer codes which are originally stored in a remote recording medium or a non-transient machine readable medium and can be downloaded through a network to be stored in a local recording medium, so that the method described herein can be processed by such software stored in the recording medium in a general purpose computer, a dedicated processor or programmable or dedicated hardware (e.g. ASIC or FPGA) . It could be understood that the computer, the processor, the microprocessor controller or the programmable hardware include a storage assembly (e.g. RAM, ROM, flash memory, etc. ) capable of storing or receiving software or computer codes, and when the software or computer codes are accessed and executed by the computer, the processor or the hardware, the processing method described herein is implemented. Moreover, when the general purpose computer accesses the codes for implementing the processing shown herein, the execution of the codes converts the general purpose computer to a dedicated computer for executing the processing illustrated herein.
The foregoing descriptions are merely specific embodiments of the present invention, but the protection scope of the present invention is not limited thereto. Any variation or substitution readily conceivable to those skilled in the art within the technical scope disclosed in the present invention shall fall within the protection scope of the present invention. Accordingly, the protection scope of the claims should prevail over the protection scope of the present invention.
Claims (16)
- A method for face hallucination, comprising:estimating a dense correspondence field based on a first image and a trained model;executing face hallucination based on the first image, the estimated dense correspondence field and the trained model through a bi-network to obtain a second image; andupdating the first image with the second image,wherein the steps of estimating, executing and updating are performed repeatedly until the obtained second image has a desired resolution or the steps of estimating, executing and updating have been repeated for predetermined times.
- The method according to claim 1, wherein the bi-network comprises a first branch and a second branch, and the step of executing comprises:executing face hallucination based on the first image through the first branch to obtain a first result;executing face hallucination based on the first image, the estimated dense correspondence field and the trained model through the second branch to obtain a second result; andincorporating the first result and the second result to obtain the second image.
- The method of claim 1, wherein the trained model stores a dense base, a landmark base, a Gauss-Newton descent regressor for estimating the dense correspondence field, and a bi-network coefficient for the face hallucination, wherein the dense base and the landmark base are predefined, and the Gauss-Newton descent regressor and the bi-network coefficient are learned by training.
- The method of claim 1, wherein the estimated dense correspondence field comprises a deformation coefficient p and a warping function W (z) for mapping a pixel z in a mean face image to a pixel x in the first image, wherein x=W (z) =z+Bp, and B is a predefined dense base.
- The method of claim 4, wherein for the repeated steps of estimating, executing and updating, the deformation coefficient p and the warping function W (z) are updated repeatedly.
- The method of claim 4, wherein for a k-th iteration, the first image is a (k-1) -th image Ik-1, the warping function is denoted as Wk (z) , and the second image is denoted as Ik and obtained by:Ik=↑ Ik-1+gk (↑ Ik-1; Wk (z) )wherein ↑ Ik-1 is an upscaled image of the (k-1) -th image Ik-1, and gk is a bi-network coefficient for the k-th iteration obtained from the trained model.
- The method of claim 4, wherein for a k-th iteration, the first image is a (k-1) -th image Ik-1, the deformation coefficient denoted as pk and the warping function denoted as Wk (z) are obtained by:pk=pk-1+fk (Ik-1; pk-1)Wk (z) =z+Bkpkwherein pk-1 is the deformation coefficient obtained in the last iteration, fk is a Gauss-Newton descent regressor obtained from the trained model, and Bk is the predefined dense base for the k-th iteration obtained from the trained model.
- An apparatus for face hallucination, comprising:an estimating unit configured to estimate a dense correspondence field based on a first image and a trained model; anda hallucination unit configured to execute face hallucination based on the first image, the estimated dense correspondence field and the trained model through a bi-network to obtain a second image;wherein the first image is iteratively updated with the second image, and the estimation unit and the hallucination unit works for a predetermined times of iterations or until the obtained second image has a desired resolution.
- The apparatus of claim 8, wherein the hallucination unit comprises:a first branch configured to execute face hallucination based on the first image to obtain a first result;a second branch configured to execute face hallucination based on the first image, the estimated dense correspondence field and the trained model to obtain a second result; anda gate network configured to incorporate the first result and the second result to obtain the second image.
- The apparatus of claim 8, wherein the trained model stores a dense base, a landmark base, a Gauss-Newton descent regressor for estimating the dense correspondence field, and a bi-network coefficient for the face hallucination, wherein the dense base and the landmark base are predefined, and the Gauss-Newton descent regressor and the bi-network coefficient are learned by training.
- The apparatus of claim 8, wherein estimated dense correspondence field comprises a deformation coefficient p and a warping function W (z) for mapping a pixel z in a mean face image to a pixel x in the first image, wherein x=W (z) =z+Bp, and B is a predefined dense base.
- The apparatus of claim 11, wherein for each time of the iterations, the deformation coefficient p and the warping function W (z) are updated repeatedly.
- The apparatus of claim 11, wherein for a k-th iteration, the first image is a (k-1) -th image Ik-1, the warping function is denoted as Wk (z) , and the second image obtained by the hallucination unit is denoted as Ik and obtained by:Ik=↑ Ik-1+gk (↑ Ik-1; Wk (z) )wherein ↑ Ik-1 is an upscaled image of the (k-1) -th image Ik-1, and gk is a bi-network coefficient for the k-th iteration obtained from the trained model.
- The apparatus of claim 11, wherein for a k-th iteration, the first image is a (k-1) -th image Ik-1, the deformation coefficient denoted as pk and the warping function denoted as Wk (z) are obtained by the estimation unit according to:pk=pk-1+fk (Ik-1; pk-1)Wk (z) =z+Bkpkwherein pk-1 is the deformation coefficient obtained in the last iteration, fk is a Gauss-Newton descent regressor obtained from the trained model, and Bk is the predefined dense base for the k-th iteration obtained from the trained model.
- A device for face hallucination, comprising:a processor; anda memory storing computer-readable instructions,wherein, when the instructions are executed by the processor, the processor is operable to:estimate a dense correspondence field based on a first image and a trained model;execute face hallucination based on the first image, the estimated dense correspondence field and the trained model through a bi-network to obtain a second image; andupdate the first image with the second image,wherein the first image is iteratively updated with the second image for a predetermined times of iterations or until the obtained second image has a desired resolution.
- A nonvolatile storage medium containing computer-readable instructions, wherein, when the instructions are executed by a processor, the processor is operable to:estimate a dense correspondence field based on a first image and a trained model;execute face hallucination based on the first image, the estimated dense correspondence field and the trained model through a bi-network to obtain a second image; andupdate the first image with the second image,wherein the first image is iteratively updated with the second image for a predetermined times of iterations or until the obtained second image has a desired resolution.
Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/CN2016/078960 WO2017177363A1 (en) | 2016-04-11 | 2016-04-11 | Methods and apparatuses for face hallucination |
| CN201680084409.3A CN109313795B (en) | 2016-04-11 | 2016-04-11 | Method and apparatus for super-resolution processing |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/CN2016/078960 WO2017177363A1 (en) | 2016-04-11 | 2016-04-11 | Methods and apparatuses for face hallucination |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2017177363A1 true WO2017177363A1 (en) | 2017-10-19 |
Family
ID=60041336
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2016/078960 Ceased WO2017177363A1 (en) | 2016-04-11 | 2016-04-11 | Methods and apparatuses for face hallucination |
Country Status (2)
| Country | Link |
|---|---|
| CN (1) | CN109313795B (en) |
| WO (1) | WO2017177363A1 (en) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110008817A (en) * | 2019-01-29 | 2019-07-12 | 北京奇艺世纪科技有限公司 | Model training, image processing method, device, electronic equipment and computer readable storage medium |
Families Citing this family (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN112001861B (en) * | 2020-08-18 | 2024-04-02 | 香港中文大学(深圳) | Image processing method and device, computer equipment and storage medium |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20070103595A1 (en) * | 2005-10-27 | 2007-05-10 | Yihong Gong | Video super-resolution using personalized dictionary |
| US20110305404A1 (en) * | 2010-06-14 | 2011-12-15 | Chia-Wen Lin | Method And System For Example-Based Face Hallucination |
| CN103208109A (en) * | 2013-04-25 | 2013-07-17 | 武汉大学 | Local restriction iteration neighborhood embedding-based face hallucination method |
| US20150363634A1 (en) * | 2014-06-17 | 2015-12-17 | Beijing Kuangshi Technology Co.,Ltd. | Face Hallucination Using Convolutional Neural Networks |
Family Cites Families (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN103530863B (en) * | 2013-10-30 | 2017-01-11 | 广东威创视讯科技股份有限公司 | Multistage reconstruction image super resolution method |
| CN104091320B (en) * | 2014-07-16 | 2017-03-29 | 武汉大学 | Based on the noise face super-resolution reconstruction method that data-driven local feature is changed |
| CN105405113A (en) * | 2015-10-23 | 2016-03-16 | 广州高清视信数码科技股份有限公司 | Image super-resolution reconstruction method based on multi-task Gaussian process regression |
-
2016
- 2016-04-11 CN CN201680084409.3A patent/CN109313795B/en active Active
- 2016-04-11 WO PCT/CN2016/078960 patent/WO2017177363A1/en not_active Ceased
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20070103595A1 (en) * | 2005-10-27 | 2007-05-10 | Yihong Gong | Video super-resolution using personalized dictionary |
| US20110305404A1 (en) * | 2010-06-14 | 2011-12-15 | Chia-Wen Lin | Method And System For Example-Based Face Hallucination |
| CN103208109A (en) * | 2013-04-25 | 2013-07-17 | 武汉大学 | Local restriction iteration neighborhood embedding-based face hallucination method |
| US20150363634A1 (en) * | 2014-06-17 | 2015-12-17 | Beijing Kuangshi Technology Co.,Ltd. | Face Hallucination Using Convolutional Neural Networks |
Non-Patent Citations (1)
| Title |
|---|
| DONG, CHAO ET AL., IMAGE SUPER-RESOLUTION USING DEEP CONVOLUTIONAL NETWORKS, vol. 2, no. 38, 1 June 2015 (2015-06-01), pages 1 - 6, XP011591233, ISSN: 0162-8828 * |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110008817A (en) * | 2019-01-29 | 2019-07-12 | 北京奇艺世纪科技有限公司 | Model training, image processing method, device, electronic equipment and computer readable storage medium |
| CN110008817B (en) * | 2019-01-29 | 2021-12-28 | 北京奇艺世纪科技有限公司 | Model training method, image processing method, device, electronic equipment and computer readable storage medium |
Also Published As
| Publication number | Publication date |
|---|---|
| CN109313795B (en) | 2022-03-29 |
| CN109313795A (en) | 2019-02-05 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US11200696B2 (en) | Method and apparatus for training 6D pose estimation network based on deep learning iterative matching | |
| US11393092B2 (en) | Motion tracking and strain determination | |
| US12340575B2 (en) | Vehicle detecting system, vehicle detecting method, and program storage medium | |
| JP7030493B2 (en) | Image processing equipment, image processing methods and programs | |
| US11314989B2 (en) | Training a generative model and a discriminative model | |
| JP6539901B2 (en) | PLANT DISEASE DIAGNOSTIC SYSTEM, PLANT DISEASE DIAGNOSTIC METHOD, AND PROGRAM | |
| US20190279014A1 (en) | Method and apparatus for detecting object keypoint, and electronic device | |
| US9886746B2 (en) | System and method for image inpainting | |
| US10817984B2 (en) | Image preprocessing method and device for JPEG compressed file | |
| US11734837B2 (en) | Systems and methods for motion estimation | |
| CN105981041A (en) | Facial Keypoint Localization Using Coarse-to-Fine Cascaded Neural Networks | |
| CN111507906A (en) | Method and apparatus for de-jittering with a neural network for fault tolerance and fluctuation robustness | |
| US11449975B2 (en) | Object count estimation apparatus, object count estimation method, and computer program product | |
| CN109685805B (en) | An image segmentation method and device | |
| CN113167568B (en) | Coordinate calculation device, coordinate calculation method, and computer-readable recording medium | |
| KR101700030B1 (en) | Method for visual object localization using privileged information and apparatus for performing the same | |
| WO2017177363A1 (en) | Methods and apparatuses for face hallucination | |
| CN109784353B (en) | Processor-implemented method, device, and storage medium | |
| CN114170087A (en) | Cross-scale low-rank constraint-based image blind super-resolution method | |
| US20230343438A1 (en) | Systems and methods for automatic image annotation | |
| WO2021075465A1 (en) | Device, method, and program for three-dimensional reconstruction of subject to be analyzed | |
| JPWO2011033744A1 (en) | Image processing apparatus, image processing method, and image processing program | |
| US20240338834A1 (en) | Method and system for estimating temporally consistent 3d human shape and motion from monocular video | |
| EP4343680A1 (en) | De-noising data | |
| Khosravi et al. | A new statistical technique for interpolation of landsat images |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 16898182 Country of ref document: EP Kind code of ref document: A1 |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 16898182 Country of ref document: EP Kind code of ref document: A1 |