WO2016113805A1 - Image processing method, image processing apparatus, image pickup apparatus, program, and storage medium - Google Patents
Image processing method, image processing apparatus, image pickup apparatus, program, and storage medium Download PDFInfo
- Publication number
- WO2016113805A1 WO2016113805A1 PCT/JP2015/006324 JP2015006324W WO2016113805A1 WO 2016113805 A1 WO2016113805 A1 WO 2016113805A1 JP 2015006324 W JP2015006324 W JP 2015006324W WO 2016113805 A1 WO2016113805 A1 WO 2016113805A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- image
- data
- weight
- depth
- image processing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N5/00—Details of television systems
- H04N5/222—Studio circuitry; Studio devices; Studio equipment
- H04N5/2224—Studio circuitry; Studio devices; Studio equipment related to virtual studio applications
- H04N5/2226—Determination of depth image, e.g. for foreground/background separation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/10—Processing, recording or transmission of stereoscopic or multi-view image signals
- H04N13/106—Processing image signals
- H04N13/122—Improving the 3D impression of stereoscopic images by modifying image signal contents, e.g. by filtering or adding monoscopic depth cues
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/20—Image signal generators
- H04N13/204—Image signal generators using stereoscopic image cameras
- H04N13/243—Image signal generators using stereoscopic image cameras using three or more 2D image sensors
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N23/00—Cameras or camera modules comprising electronic image sensors; Control thereof
- H04N23/45—Cameras or camera modules comprising electronic image sensors; Control thereof for generating image signals from two or more image sensors being of different type or operating in different modes, e.g. with a CMOS sensor for moving images in combination with a charge-coupled device [CCD] for still images
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N23/00—Cameras or camera modules comprising electronic image sensors; Control thereof
- H04N23/80—Camera processing pipelines; Components thereof
- H04N23/81—Camera processing pipelines; Components thereof for suppressing or minimising disturbance in the image signal generation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N2013/0074—Stereoscopic image analysis
- H04N2013/0081—Depth or disparity estimation from stereoscopic image signals
Definitions
- the present invention relates to an image processing method which performs noise reduction of an image.
- PTL 1 discloses a method of removing a noise by using self-similarity of an object space called an NLM (non-local means) filter.
- the NLM filter can remove the noise by replacing a signal value of a target pixel with a weighted average signal value of a plurality of pixels arranged around the target pixel.
- a weight used in the weighted averaging is determined depending on a distance between a vector which has a component of each signal value in a partial region near the target pixel and a vector which is similarly generated from the pixel around the target pixel. Accordingly, the noise can be removed from the image while keeping resolution feeling of an edge.
- pixels having different structures from those of pixels near the target pixel are also used in the weighted averaging.
- the weights for the pixels are set to be small, but there are a large number of the pixels and therefore the influence on the weighted averaging cannot be ignored. Accordingly, a texture component of an image in which a high-frequency component is relatively weak is easily lost along with a noise. As described above, it is difficult to perform the noise reduction of an image with high accuracy.
- the present invention provides an image processing method, an image processing apparatus, an image pickup apparatus, a program, and a storage medium which are capable of performing noise reduction of an image with high accuracy.
- An image processing method as one aspect of the present invention includes the steps of acquiring first data relating to a partial region including a target pixel from an input image, acquiring a plurality of second data relating to a plurality of partial regions, each including one of a plurality of reference pixels, determining a weight with respect to each of the plurality of second data depending on a correlation between the first data and each of the plurality of second data, and generating an output pixel corresponding to the target pixel based on the plurality of reference pixels and the weight, and at least one of the plurality of reference pixels and the weight is determined based on distance information of the input image.
- An image processing apparatus as another aspect of the present invention includes a storage device configured to store an input image, and an image processor configured to generate an output image based on the input image, and the image processor is configured to perform the image processing method.
- An image pickup apparatus as another aspect of the present invention includes an image pickup element configured to photoelectrically convert an optical image formed via an optical system to output image data, and an image processor configured to generate an output image from an input image based on the image data, and the image processor is configured to perform the image processing method.
- a program as another aspect of the present invention causes a computer to execute the image processing method.
- a storage medium as another aspect of the present invention stores the program.
- an image processing method, an image processing apparatus, an image pickup apparatus, a program, and a storage medium which are capable of performing noise reduction of an image with high accuracy can be provided.
- FIG. 1 is a block diagram of an image pickup apparatus in Embodiment 1.
- FIG. 2 is an external view of the image pickup apparatus in Embodiment 1.
- FIG. 3 is a schematic diagram of a parallax image acquirer in Embodiment 1.
- FIG. 4 is a flowchart of noise reduction processing in each of Embodiments 1 and 3.
- FIG. 5 is an explanatory diagram of an input image in each of Embodiments 1 to 3.
- FIG. 6A is an explanatory diagram of a depth map in each of Embodiments 1 and 3.
- FIG. 6B is an explanatory diagram of a depth map in each of Embodiments 1 and 3.
- FIG. 7 is a schematic diagram of an image pickup apparatus and an object space in each of Embodiments 1 to 3.
- FIG. 8A is a point spread function in each of Embodiments 1 to 3.
- FIG. 8B is a point spread function in each of Embodiments 1 to 3.
- FIG. 8C is a point spread function in each of Embodiments 1 to 3.
- FIG. 9A is a modulation transfer function in each of Embodiments 1 to 3.
- FIG. 9B is a modulation transfer function in each of Embodiments 1 to 3.
- FIG. 9C is a modulation transfer function in each of Embodiments 1 to 3.
- FIG. 10A is frequency characteristics of reference data in each of Embodiments 1 to 3.
- FIG. 10B is frequency characteristics of reference data in each of Embodiments 1 to 3.
- FIG. 11 is a block diagram of an image processing system in Embodiment 2.
- FIG. 11 is a block diagram of an image processing system in Embodiment 2.
- FIG. 12 is an external view of the image processing system in Embodiment 2.
- FIG. 13 is a schematic diagram of a parallax image acquirer in Embodiment 2.
- FIG. 14 is a flowchart of noise reduction processing in each of Embodiments 2 and 3.
- FIG. 15 is an explanatory diagram of a depth map in each of Embodiments 2 and 3.
- FIG. 16 is a block diagram of an image pickup system in Embodiment 3.
- FIG. 17 is an external view of the image pickup system in Embodiment 3.
- a signal value of an image has one dimension (i.e., monochrome).
- an input image is a color image having a multidimensional signal value
- the following processing may be performed only for a certain dimensional component and it may be repeated for the other dimensional components similarly.
- a term “pixel” relating to a certain image means a position, a signal value, or depth information (distance information) of a pixel.
- a partial region including a target pixel on which the noise reduction is to be performed is extracted as target data (first data) from the input image.
- a reference pixel acquiring region is set in the input image depending on the target pixel, and a plurality of reference pixels are selected from the reference pixel acquiring region.
- reference data (second data) as a partial region including each of the plurality of reference pixels are acquired, and a correlation value of the target data and each of the reference data is calculated.
- a weight of each of the reference data is determined depending on the correlation value. The weight is determined to increase with increasing the correlation, i.e., with increasing similarity of the reference data and the target data.
- a weighted average signal value is calculated by using the weight, and a signal value of the target pixel is replaced with the weighted average signal value, and thus the noise reduction (noise reduction processing) is finished.
- a least one of the reference pixel acquiring region and the weight changes depending on a depth map of the input image.
- at least one of the reference pixel acquiring region and the weight changes depending on a color distribution of the input image. The color distribution can be acquired by using a multidimensional signal value of the input image.
- the depth map or the color distribution of the input image is used to acquire a region where there is a high possibility that the same object as that in the target data is included to emphasize the reference data in the region, and thus the influence of the reference data having a low similarity to the target data can be reduced. Accordingly, a noise reduction effect with a small loss of the texture, i.e., with high accuracy, can be achieved.
- FIG. 1 is a block diagram of an image pickup apparatus 100 in this embodiment.
- FIG. 2 is an external view of the image pickup apparatus 100.
- a parallax image acquirer 101 includes a plurality of imaging optical systems 102a to 102c and a plurality of image pickup elements 103a to 103c.
- 16 combinations of the imaging optical system and the image pickup element are arranged in two dimensions of 4 ⁇ 4, but fourth and subsequent imaging optical systems and image pickup elements are omitted in FIG. 1.
- Each of the image pickup elements 103a to 103c includes a CCD (Charge Coupled Device), a CMOS (Complementary Metal-Oxide Semiconductor), or the like.
- each of the image pickup elements 103a to 103c photoelectrically converts an optical image formed via the optical system (corresponding imaging optical systems 102a to 102c) to output image data.
- An A/D converter 104 converts the analog electric signals (image data) output from the image pickup elements 103a to 103c into digital signals (input image based on the image data), and it outputs the digital signals to an image processor 105.
- the image processor 105 acquires a depth map (distance information) of an object space and performs noise reduction processing in addition to predetermined processing to generate an output image from the input image.
- the detail of the noise reduction processing will be described below.
- the depth map acquired by the image processor 105 and optical information of the image pickup apparatus 100 are stored in a storage device 106 (memory).
- the optical information is information relating to a state of the parallax image acquirer 101 during capturing an image, and for example it is image capturing condition information such as a state of an aperture stop, a focus position, and a focal length.
- a state detector 111 can acquire the optical information from a system controller 109 or a controller 110.
- the image processed by the image processor 105 is stored in an image recording medium 108 (memory) in a predetermined format.
- the depth map and the optical information may be stored at the same time.
- the image stored in the image recording medium 108 may be read as an input image and the image processor 105 may perform the noise reduction processing in this embodiment on the input image.
- the image is output to a display device 107 such as a liquid crystal display.
- a series of controls described above is performed based on an instruction of the system controller 109.
- a mechanical drive of the parallax image acquirer 101 is performed by the controller 110 based on an instruction of the system controller 109.
- FIG. 3 is a schematic diagram of the parallax image acquirer 101 (imaging optical systems 102a to 102p).
- the parallax image acquirer 101 has a compound-eye configuration as illustrated in FIG. 3.
- the imaging optical systems 102a to 102p are arranged in two dimensions, and 16 corresponding image pickup elements (not illustrated) are arranged behind the respective imaging optical systems 102a to 102p.
- a single image pickup element may be provided if images (optical images) formed by the imaging optical systems 102a to 102p can be received.
- the image pickup elements corresponding to the respective imaging optical systems 102a to 102p may have the number of pixels different from each other.
- the imaging optical systems 102a to 102p are categorized into a plurality of types having focal lengths different from each other.
- the imaging optical systems 102a to 102d are wide-angle lenses
- the imaging optical systems 102e to 102h are standard lenses
- the imaging optical systems 102i to 102l are medium telephoto lenses
- the imaging optical systems 102m to 102p are telephoto lenses.
- the type, the number, and the arrangement of the imaging optical systems are not limited thereto.
- the parallax image acquirer 101 is not limited to the compound-eye configuration, and for example the configuration of the Plenoptic camera as described in Embodiment 2 below may be adopted.
- a single-viewpoint image acquirer may be provided if the image pickup apparatus 100 can acquire the depth map of the object space without using the parallax images.
- a TOF (Time of Flight) method or a structured illumination may be used as an example of acquiring the depth map of the object space without using the parallax images.
- FIG. 4 is a flowchart of the noise reduction processing.
- FIG. 5 is an explanatory diagram of the input image.
- FIGs. 6A and 6B are explanatory diagrams of the depth map. Each step in FIG. 4 is performed mainly by the image processor 105 based on an instruction of the system controller 109.
- the image processor 105 acquires an input image (captured image) and a depth map (distance information of an object space) relating to the input image.
- the input image is an image for which the noise reduction is to be performed.
- the input image may be any one of a single-viewpoint image, a plurality of parallax images, and a synthesized image of them, which are acquired by the parallax image acquirer 101.
- the depth map is acquired by using a stereo method or the like. In this case, the depth map can be estimated only for an edge region in an image where a feature point exists, and a depth in a non-edge region such as a gradation can be acquired by interpolation based on a depth in the edge region.
- the image processor 105 acquires, from the input image, a pixel (target pixel) for which the noise reduction is to be performed and target data (first data) relating to a partial region including the target pixel in the input image.
- a target pixel 201a and target data 203a are acquired from an input image 200.
- a position, a size, and a shape of each of the target pixel 201a and the target data 203a are not limited thereto.
- the target pixel 201a acquired at once may be a single pixel or a plurality of pixels.
- the target data 203a need to have information relating to a structure (i.e., distribution of signal values), and accordingly they are data relating to a partial region including a plurality of pixels.
- the target data 203a may coincide with the target pixel 201a.
- the image processor 105 determines a reference pixel acquiring region based on the depth map acquired at step S101. Then, the image processor 105, as described below, extracts a plurality of reference pixels from the reference pixel acquiring region and calculates a weighted average of signal values of the plurality of extracted reference pixels to reduce a noise of the target pixel. In this case, with increasing the number of pixels which are disposed near pixels having structures similar to the target data in the plurality of reference pixels, highly-accurate noise reduction can be performed while keeping the structure of the target pixel. There is a high possibility that the structure similar to that of the target data (in particular, a texture component) exists in an object where the target pixel exists.
- the same object region can be roughly specified in the input image by using the depth information (depth map, i.e., distance information in the object space). Accordingly, reference data (second data) having a structure similar to that of the target data can be effectively acquired by restricting the reference pixel acquiring region with the use of the depth map.
- depth map i.e., distance information in the object space
- the reference data are a partial region including the reference pixel, and they correspond to reference data 205a and 205b in FIG. 6A.
- the reference data will be described in detail when explaining step S104 in FIG. 4.
- the reference pixel acquiring region is determined based on the depth map, but this embodiment is not limited thereto.
- the same object region can be roughly acquired similarly based on a color distribution (color distribution information) of the input image. This will be described in Embodiment 2 in detail.
- a first method is a method of setting a depth threshold value (first threshold value) which is a threshold value relating to the depth to eliminate a region which has a depth apart from the depth of the target data by not less than the depth threshold value.
- first threshold value a depth threshold value relating to the depth to eliminate a region which has a depth apart from the depth of the target data by not less than the depth threshold value.
- outlines of objects are indicated by solid lines, but in an actual depth map, these outlines do not exist. This is because the objects existing at the same depth position (three objects and a floor in FIGs. 6A and 6B) cannot be distinguished. However, the outline can be obtained if the color distribution of the input image or the like is used.
- a reference pixel acquiring region 204a indicated by a dashed line is set on conditions that a region in which a depth is apart from the depth of the target data 203a by a value more than the depth threshold value is excluded.
- a shape and a size of the reference pixel acquiring region 204a are not limited thereto.
- an object may be cut out based on information relating to a depth or a color, and a whole of the object may be set as a reference pixel acquiring region.
- the depth threshold value can be determined by creating a histogram for the depth of the input image and by using a mode method, an error minimization of the GMM (Gaussian Mixture Model), or the like.
- the depth threshold value can be determined based on optical information of the image pickup apparatus 100, and its details will be described below.
- a second method as a method of restricting (method of determining) the reference pixel acquiring region based on the depth map will be described.
- a depth in the target data is discontinuous (for example, an edge region) is considered.
- the determination as to whether the depth is discontinuous or not can be performed by setting a depth differential threshold value that is a threshold value relating to a differential of the depth, and by determining whether an absolute value of the differential of the depth in the target data exceeds the depth differential threshold value.
- the target data include an edge structure.
- the reference pixel acquiring region is similarly restricted to a region in which an edge exists. For example, as illustrated in FIG.
- the edge region in which the depth is discontinuous is selected as a target pixel 201b and target data 203b.
- a region which has a structure similar to that of a region near the target pixel 201b is only the edge region. Accordingly, the region in which the depth is discontinuous similarly is specified as a reference pixel acquiring region 204b.
- the method (first method) of restricting the reference pixel acquiring region based on the depth threshold value as described referring to FIG. 6A may be combined.
- the two methods as methods of restricting (method of determining) the reference pixel acquiring region based on the depth map are described, but this embodiment is not limited thereto.
- the values of the depths in the target data and in the reference data may be values of the depths of the target pixel and the reference pixel (depth average value of each pixel if each of the target pixel and the reference pixel includes a plurality of pixels).
- an average value of depths of all pixels in the target data or in the reference data may be adopted.
- depth reliability depth reliability
- the target data or the reference data are an edge region in which the depth is discontinuous, there is a high possibility that the average value of all the pixels in the data is a different depth.
- the depth is determined only by the target pixel or the reference pixel.
- the depth of the reference pixel 202d illustrated in FIG. 6B is the depth average value of all the pixels in the reference data 205d, values of depths on the background are mixed and accordingly there is a high possibility that the value is shifted from that of the object (rectangle).
- the image processor 105 acquires a plurality of reference pixels and reference data (second data) from the reference pixel acquiring region.
- reference pixels 202a and 202b and reference data 205a and 205b illustrated in FIG. 6A or reference pixels 202c and 202d and reference data 205c and 205d (third or subsequent pixels or data are omitted) illustrated in FIG. 6B are acquired.
- a size and a shape of each of the reference pixels and the reference data are not limited thereto.
- the reference pixel and the reference data do not have to coincide with the target pixel and the target data, respectively. This is because the numbers of both pixels can coincide with each other by a size conversion described below.
- the reference data need to include information relating to a distribution of signals, and accordingly they are data relating to a plurality of pixels.
- the target pixel is selected from a certain color component (for example, Green) in the input image
- the reference pixel and the reference data may be acquired from another color component (Red or Blue).
- the image processor 105 calculates a correlation value of the target data and the reference data.
- the correlation value can be calculated by using a feature-based method such as SIFT (Scale-Invariant Feature Transform) and SURF (Speeded-Up Robust Features) or by using a region-based method described below.
- SIFT Scale-Invariant Feature Transform
- SURF Speeded-Up Robust Features
- the feature-based method focuses on a feature amount, and accordingly the correlation value can be calculated even when the numbers of pixels in the target data and the reference data are different from each other.
- the region-based method focuses on a difference of signal values, it is necessary to match the numbers of both pixels in order to calculate the correlation exactly.
- the calculation of the correlation using the region-based method can determine similarity with high accuracy compared to the calculation using the feature-based method, and accordingly it is preferred that the region-based method is used.
- a first correlation calculation expression uses a root-mean-square of signal values in the target data and the reference data.
- a correlation calculation expression g 1 (first correlation calculation expression) is represented by the following expression (1).
- symbol T is a matrix including components T ij of signal values of respective pixels in the target data
- symbol N is the number of lines in the matrix T
- symbol M is the number of rows in the matrix T
- symbol R k is a matrix including components of signal values of respective pixels in k-th reference data.
- symbol N Rk is the number of lines in the matrix R k
- symbol M Rk is the number of rows in the matrix R k
- Symbol ⁇ (R k ,N/N Rk ,M/M Rk ) represents a conversion (magnification or reduction of an image) in which the number of lines in the matrix R k is magnified by N/N Rk and the number of rows in the matrix R k is magnified by M/M Rk .
- bilinear interpolation, bicubic interpolation, or the like may be used.
- expression (1) is rewriten as represented by the following expression (3).
- symbol t is a vector including components t i of the signal values in the target data
- symbol r k is a vector including components of signal values in the k-th reference data
- symbol ⁇ is a vector in which components of the matrix P are rearranged in one dimension
- components of the vector ⁇ is ⁇ i .
- Each of the correlation calculation expressions represented by expressions (1) and (3) is an expression relating to a difference between the target data and the reference data, and accordingly, it means that the similarity of both data is higher as the value is closer to zero.
- a DC (direct-current) component i.e., average value corresponding to a brightness of an image
- the correlation calculation determines the similarity level of the structures of the target data and the reference data, and therefore the brightness (direct-current component) is irrelevant to the correlation.
- a contrast in the reference data may be adjusted such that the correlation of them is maximized. This corresponds to multiplication of an AC (alternating-current) component by a scalar.
- expression (1) is rewritten as represented by the following expression (4).
- symbols T ave and P ave are average values of signal values in the matrixes T and P, respectively, and these average values may be calculated by a uniform weight or alternatively weighted averaging may be adopted.
- Symbol c is a coefficient which adjusts the contrast, and it is represented by the following expression (5) based on the method of least squares.
- the second correlation calculation expression uses the SSIM (Structure Similarity), and it is represented by the following expression (6).
- symbols L, C, and S are evaluation functions relating to a brightness, a contrast, and other structures, each of which indicates a value from 0 to 1. It means that two signals to be compared are close to each other as each value is closer to 1.
- a plurality of correlation calculation expressions may be combined.
- isometric transformation may be performed for the reference data so that the correlation value of the target data and the reference data is maximized.
- the isometric transformation is for example identity transformation, rotation transformation, or inversion transformation.
- the transformation by which the correlation value is maximized is performed also for the reference pixel at step S107 in FIG. 4.
- the similarity tends to be increased by performing the isometric transformation.
- a calculation amount increases, and accordingly it is preferred that whether the isometric transformation is to be performed is determined by comparing the effect of the noise reduction and the calculation amount.
- the image processor 105 determines a weight (weight coefficient) for each of the plurality of reference data based on the correlation value calculated at step S105. As the correlation increases, the reference data are more similar to the target data and accordingly the weight is set to be larger. For example, the weight is determined as represented by the following expression (7) by using expression (3).
- symbol w k is a weight corresponding to the k-th reference data
- symbol h is a strength of a filter.
- Symbol Z is a normalization factor of the weight w k , which satisfies the following expression (8).
- the method of determining the weight is not limited thereto.
- a table of weights corresponding to respective correlation values may be previously stored to determine the weight referring to this table.
- the image processor 105 calculates a weighted average of the signal values of the reference pixels by using the weight determined at step S106. Then, the image processor 105 replaces the signal value of the target pixel with the calculated weighted average (weighted average signal value). Thus, the noise reduction of the target pixel is completed.
- a weight average signal value S ave is calculated by the following expression (9).
- symbol s k is a signal value of the reference pixel in the k-th reference data.
- each of the signal value s k and the weighted average signal value s ave has a vector amount.
- the method of calculating the weighted average is not limited thereto, and other methods such as nonlinear coupling may be used.
- the replacement processing is used for the noise reduction, and learning-type noise reduction processing can also be performed by using the weighted average signal value.
- step S108 the image processor 105 determined whether the processing on a predetermined region in the input image is finished. If the processing relating to all pixels on which the noise reduction is to be performed is not completed, the flow returns to step S102 and the image processor 105 selects a new target pixel. On the other hand, if the processing relating all the pixels on which the noise reduction is to be performed is completed, the flow is finished. According to the processing described above, a loss of a texture component caused by the noise reduction of an image can be reduced to perform the noise reduction with high accuracy
- FIG. 7 is a schematic diagram of the image pickup apparatus 100 and the object space.
- FIGs. 8A to 8C are point spread functions (PSFs).
- FIGs. 9A to 9C are modulation transfer functions (MTFs).
- x, y, and z represent x, y, and z axes in a three-dimensional coordinate, respectively, and the z axis indicates a depth direction.
- the image pickup apparatus 100 focuses on an in-focus plane 211.
- an out-of-focus plane 212 will be considered.
- the point spread function of the image pickup apparatus 100 corresponding to the out-of-focus plane 212 is represented as illustrated in FIG. 8A.
- FIG. 9A is an MTF corresponding to the point spread function of FIG. 8A.
- Symbol f x is a spatial frequency in an x-axis direction, and only a positive quadrant is illustrated for simplicity.
- Symbol f max illustrates a spatial frequency at which a value of the MTF is zero in FIG. 9A.
- an acquirable maximum frequency decreases.
- FIG. 8B a point image on the out-of-focus plane 213 in FIG. 7 is captured with a blur more than that in FIG. 8A.
- FIG. 9B is an MTF corresponding to the point spread function of FIG. 8B.
- the acquirable maximum frequency is decreased compared to the case of FIG. 9A.
- An out-of-focus plane 214 in FIG. 7 is a plane further apart from the in-focus plane 211 compared to the out-of-focus plane 213.
- FIG. 8C is a point spread function on the out-of-focus plane 214, and FIG. 9C is an MTF of the point spread function of FIG. 8C.
- acquirable information varies depending on a depth.
- an actually captured image is an image on which the point spread function of FIG. 8A or 8C is superimposed.
- the acquirable frequency bands are different, the images have different structures on the input image.
- the acquirable frequency component in each depth can be calculated based on the optical information of the image pickup apparatus 100. Accordingly, in this embodiment, it is preferred that the reference pixel acquiring region is determined depending on the optical information.
- the optical information is a focusing distance, a focal length, an F number, an optical transfer function (OTF), a point spread function (PSF), and a spread amount of an image caused by an aberration, a diffraction, and defocusing, and the like.
- the acquirable frequency component can be exactly acquired if the OTF, the MTF, or the PSF in each depth is known, and it can be approximately obtained based on the focal length and the F number.
- the input image is an image obtained by synthesizing a plurality of parallax images
- the F number is determined based on an opening obtained by synthesizing each of an opening of the imaging optical system acquiring each parallax image, and the other optical information is also determined corresponding to the synthesis of the images.
- a threshold value f thr is determined for the depth of the target data, and an edge where the MTF at the threshold value f thr is greater than or equal to a predetermined value r thr is defined as a depth threshold value.
- An example which determines the reference pixel acquiring region by using this method is a reference pixel acquiring depth range 215 in FIG. 7. The depth in which a spread of an image caused by defocusing is the same is different in front of and behind the in-focus plane 211, and accordingly the depth threshold values are also different in front of and behind the out-of-focus plane 212 as illustrated in FIG. 7.
- the case in which the target data exist on the out-of-focus plane 212 is indicated, and similarly this embodiment is applied for the case in which the target data exist on the out-of-focus plane 211.
- this determination method is especially effective as the target data are acquired by a depth which is closer to the in-focus plane.
- the depth threshold value is determined depending on whether the depth of the target data and the shape of the point spread function (or MTF, or the like) are similar.
- the similarity of the point spread function may be determined for example by the calculation using an expression similar to expression (1). By replacing an intensity of the point image with a signal value of a pixel, the similar calculation can be performed. An edge of the depth where the similarity satisfies a predetermined condition may be set as a depth threshold value.
- the depth in which the acquirable frequency band is different from that of the depth of the target data i.e., depth in which the frequency is insufficient or excessive
- the reference pixel acquiring region i.e., depth in which the frequency is insufficient or excessive
- the depth threshold value used at step S103 is determined by considering frequency characteristics of the target data in addition to the optical information of the image pickup apparatus 100.
- the structure of the target data depends on not only the point spread function in the depth of the target data but also a structure of an object.
- a case in which the target data are acquired from the out-of-focus plane 212 in FIG. 7 will be considered.
- FIGs. 10A and 10B are frequency characteristics of the target data in this case.
- FIG. 10A illustrates a case in which an object included in the target data has a fine structure
- FIG. 10B illustrates a case in which an object in the target data has a rough structure (only with a low frequency).
- the structure similar to the target data having the frequency characteristics of FIG. 10A exists only in an acquirable depth up to a high-frequency component (near the in-focus plane 211 or the out-of-focus plane 212).
- the depth threshold value changes depending on the frequency characteristics of the target data.
- a frequency at which a spectral intensity is less than or equal to a predetermined value r thr is referred to as a threshold value f thr , and as described above, the first determination method of the depth threshold value using the optical information may be used.
- the threshold value f thr of the frequency varies depending on the frequency of the target data, the depth threshold value also changes.
- this embodiment includes a step of acquiring a map of a depth reliability that represents an accuracy of the depth map of the input image, and processing is switched when determining the reference pixel acquiring region at step S103 depending on the depth reliability.
- the depth map is calculated based on the parallax images, and accordingly the accuracy of estimation of the depth is decreased if for example the number of corresponding points of the parallax images is small.
- the accuracy of acquisition may be decreased due to a disturbance or characteristics of an object surface.
- these low-accuracy depths are used for the processing, the effect of this embodiment is reduced. Therefore, it is preferred that the processing is switched depending on the depth reliability.
- a threshold value (second threshold value) relating to the depth reliability is set, and a region in which the depth reliability is lower than the second threshold value is removed from the reference pixel acquiring region. Accordingly, the possibility that the reference pixel having a different structure from that of a pixel near the target pixel is synthesized at step S107 can be decreased.
- this embodiment sets a threshold value (third threshold value) relating to a depth reliability different from the depth reliability described above, and the reference pixel acquiring region is determined without using the depth map at step S103 when the depth reliability of the target data is lower than the third threshold value.
- the reference pixel acquiring region is determined by using the depth when the depth reliability of the target data is low, there is a possibility that the reference data are acquired only from an object different from the target data. Accordingly, when the reliability is lower than the third threshold value, the reference pixel acquiring region is determined independently of depth information, and as a result the acquisition of only the reference data with a similarity lower than that of the target data can be avoided.
- the depth reliability may be defined such that the reliability increases in a region where there are a lot of corresponding points of the parallax images or there is an edge with strong intensity. This is because the accuracy of calculating the depth is improved in the region where there are many corresponding points or there is a strong edge portion.
- an image pickup apparatus which is capable of performing highly-accurate noise reduction of an image can be provided.
- FIG. 11 is a block diagram of an image processing system 300 in this embodiment.
- FIG. 12 is an external view of the image processing system 300.
- the image processing system 300 in this embodiment is provided with an image pickup apparatus and an image processing apparatus which performs noise reduction processing in this embodiment separately from each other, and color information is used to restrict a reference pixel acquiring region in the noise reduction processing and depth information is used in calculating a weight of reference data.
- the input image acquired by an image pickup apparatus 301 is output to an image processing apparatus 302 via a communication device 303.
- the image pickup apparatus 301 is capable of acquiring a parallax image
- a storage device 304 stores a depth map acquired from parallax information and optical information determined at the time of capturing the input image.
- a noise reduction device 305 image processor
- the output image processed by the noise reduction device 305 is output to at least one of a display apparatus 306, a recording medium 307, and an output device 308 via the communication device 303.
- the display apparatus 306 is for example a liquid crystal display or a projector.
- a user can work while confirming the image under processing through the display apparatus 306.
- a recording medium 307 is for example a semiconductor memory, a hard disk, or a server on a network.
- the output device 308 is for example a printer.
- the image processing apparatus 302 has a function that performs development processing and other image processing as needed.
- FIG. 13 is a schematic diagram of the parallax image acquirer in the image pickup apparatus 301.
- the parallax image acquirer includes an imaging optical system 301a, a lens array 301b, and an image pickup element 301c.
- the lens array 301b is disposed on a plane conjugate to an in-focus plane 311 via the imaging optical system 301a.
- the lens array 301b is configured so that an exit pupil of the imaging optical system 301a and the image pickup element 301c approximately have a conjugate relation.
- Light rays from an object space pass through the imaging optical system 301a and the lens array 301b, and they enter different pixels from each other of the image pickup element 301c depending on a pupil region (i.e., viewpoint) of the imaging optical system 301a through which the light rays pass.
- a viewpoint is divided into five viewpoints in one-dimensional direction, and accordingly 25 parallax images are acquired in two dimensions.
- the number of the viewpoints is not limited thereto.
- the configuration illustrated in FIG. 13 is called a Plenoptic 1.0 camera, which is described in Japanese Patent No. 4752031 in detail.
- a Plenoptic camera capable of obtaining parallax images for example, other configurations as disclosed in US Patent No. 7962033 may be adopted.
- An object does not need to exist on the in-focus plane 311 (i.e., focusing may be performed on a space where nothing exists). This is because a focus position control can be performed after capturing images, which is called refocusing, by synthesizing the parallax images.
- FIG. 14 is a flowchart of the noise reduction processing in this embodiment.
- FIG. 15 is an explanatory diagram of the depth map in this embodiment.
- Each step of FIG. 14 is performed mainly by the noise reduction device 305 based on an instruction of a system controller (not illustrated) included in the image processing apparatus 302.
- a system controller not illustrated
- FIG. 14 the same descriptions as those described in Embodiment 1 referring to FIG. 4 are omitted.
- Steps S201 and S202 are the same as steps S101 and S102 in FIG. 4, respectively. Subsequently, at step S203, the noise reduction device 305 determines the reference pixel acquiring region based on color distribution (color distribution information) of the input image.
- a reference pixel acquiring region 404 indicated by a dashed line is set as a region which is located within a range of the predetermined number of pixels in a horizontal direction and in a vertical direction with reference to a target pixel 401 and has similar color information to that in a target data 403 (in this embodiment, a circle and a triangle are objects with similar colors).
- the method of determining the reference pixel acquiring region 404 is not limited thereto. By restricting the reference pixel acquiring region 404 to a region which has a similar color, acquisition of the reference pixel from an object having a different structure can be avoided. For dividing regions based on colors, for example K-means clustering can be used. Alternatively, a color space may be previously divided into some groups, and the reference pixel acquiring region 404 may be determined depending on the group to which each pixel of the input image belongs.
- steps S204 and S205 in FIG. 14 are the same as steps S104 and S105 in FIG. 4, respectively.
- the noise reduction device 305 acquires reference pixels 402a and 402b and reference data 405a and 405b (third or subsequent pixels or data are omitted) to calculate a correlation value of target data 403 and the reference data.
- the noise reduction device 305 determines the weight of the reference data by using the depth map and the correlation value calculated at step S205.
- the weight is increased as the depths of the target data and the reference data are closer to each other. For example, in FIG. 15, the weight of the reference pixel 402a existing on a depth similar to that of the target pixel 401 is increased, and on the other hand the weight of the reference pixel 402b which has a different depth from that of the target pixel 401 is decreased.
- the weight is represented for example by the following expression (10).
- symbol v k is a weight corresponding to the k-th reference data
- symbol D 0 is a depth of the target data
- symbol D k is a depth of the k-th reference data
- symbol d is a scaling parameter of the depth.
- Symbol ⁇ 1 is a normalization factor of the weight v k , which satisfies the following expression (11).
- the method of determining the weight is not limited thereto.
- the weight may be determined by using color information, as well as the depth.
- expression (10) is rewritten as represented by the following expression (12).
- symbol u k is a weight corresponding to the k-th reference data
- symbol ⁇ 0k is a distance between averaged pixel values in the target data and the k-th reference data in a color space
- symbol ⁇ is a scaling parameter of the distance.
- Symbol ⁇ 2 is a normalization factor of the weight u k , which satisfies the following expression (13).
- Steps S207 and S208 are the same as steps S107 and S108 in FIG. 4, respectively. According to the processing described above, a loss of a texture component caused by noise reduction of an image can be decreased to perform highly-accurate noise reduction.
- the noise reduction device 305 acquires information relating to a resolution limit of the reference data based on the optical information of the image pickup apparatus 301 capturing the input image and the depth map. Then, it is preferred that the noise reduction device 305 changes the weight depending on the information relating to the acquired resolution limit and the frequency characteristics of the reference data.
- the input image is blurred due to the aberration, the diffraction, or the defocusing, and accordingly there is a spatial frequency (resolution limit) as a limit of the resolution in the input image. Therefore, if there is a higher frequency component than the resolution limit, it is a noise component.
- the resolution limit can be calculated in each region of the input image. Therefore, the resolution limit of the reference data is calculated to be compared with the frequency characteristics of the reference data, and thus a part of a generated noise amount can be evaluated. It is believed that the reference data in which the MTF is large at a higher frequency than the resolution limit contain a large amount of noise, and therefore it is preferred that the weight is decreased. Accordingly, the effect of the noise reduction can be further improved.
- Embodiment 1 it is preferred that a map of a depth reliability is acquired and the weight is set to be smaller as the depth reliability of the reference data is lower. Thus, an effect of the noise reduction in this embodiment can be achieved with higher accuracy.
- the weight is determined independently of the depth. According to the configuration described above, an image processing system which is capable of performing highly-accurate noise reduction of an image can be provided.
- FIG. 16 is a block diagram of an image pickup system 500 in this embodiment.
- FIG. 17 is an external view of the image pickup system 500.
- an image pickup apparatus is connected to a server via a wireless or wired network, and an image processor in the server is capable of performing noise reduction processing on an image transferred from the image pickup apparatus to the server.
- An image pickup apparatus 501 includes a TOF (Time of Flight) image pickup element, and it is capable of acquiring an input image and a depth map (distance information) of the input image by photographing.
- a server 503 image processing apparatus
- the image pickup apparatus 501 captures an image
- the input image (captured image) and the depth map are automatically or manually input to the server 503, and a storage device 505 (memory or memory circuit) stores the input image and the depth map.
- optical information of the image pickup apparatus 501 is also stored in the storage device 505.
- An image processor 506 (image processing circuit) performs noise reduction processing (image processing method) on the input image to generate an output image based on the input image.
- the processed output image is output to the image pickup apparatus 501 or stored in the storage device 505.
- the image processing method (noise reduction processing) in this embodiment is the same as that of Embodiment 1 or Embodiment 2 described referring to FIG. 4 or FIG. 14, and accordingly descriptions thereof are omitted.
- the image processing method in each embodiment acquires first data (target data) relating to a partial region including a target pixel from an input image (S102, S202). Subsequently, the method acquires a plurality of second data (a plurality of reference data) relating to a plurality of partial regions, each including one of a plurality of reference pixels (S104, S204). Then, the method determines a weight with respect to each of the plurality of second data depending on a correlation between the first data and each of the plurality of second data (S106, S206), and generates an output pixel (output image including the output pixel) corresponding to the target pixel based on the plurality of reference pixels and the weight (S107, S207).
- At least one of the plurality of reference pixels and the weight is determined based on distance information of the input image.
- a part of steps in each embodiment can be excluded, or at least a part of steps in Embodiments 1 and 2 can be combined.
- the image processing method determines a reference pixel acquiring region depending on the target pixel (S103, S203), and the plurality of reference pixels are selected from the reference pixel acquiring region.
- at least one of the plurality of reference pixels and the weight is determined based on color distribution information of the input image.
- a signal value of the target pixel is replaced with a signal value calculated based on signal values of the plurality of reference pixels and the weight to generate the output pixel.
- the distance information of the input image is a depth map of an object space in the input image.
- the reference pixel acquiring region is determined so as not to include a region having a depth different from a depth in the first data by at least a first threshold value. More preferably, the first threshold value is determined depending on optical information of an image pickup apparatus capturing the input image. More preferably, the optical information contains a modulation transfer function (MTF) or a point spread function (PSF) of the image pickup apparatus. More preferably, the first threshold value is determined depending on frequency characteristics of the first data. More preferably, when determining the reference pixel acquiring region (S103), the reference pixel acquiring region varies depending on a differential value (edge region) of a depth in the first data.
- MTF modulation transfer function
- PSF point spread function
- the weight decreases with increasing a difference between a depth in the first data and a depth in the second data.
- information relating to a resolution limit of the second data is acquired based on the depth map and optical information of the image pickup apparatus capturing the input image, and the weight changes depending on the information relating to the resolution limit and frequency characteristics of the second data.
- the reference pixel acquiring region or the weight when determining at least one of the reference pixel acquiring region or the weight, whether the depth map is to be considered or not is determined depending on a depth reliability relating to an accuracy of the depth map. More preferably, when determining the reference pixel acquiring region (S103), the reference pixel acquiring region is determined so as not to include a region which has the depth reliability lower than a second threshold value. More preferably, when determining the weight (S206), the weight decreases with decreasing the depth reliability in the second data. More preferably, when the depth reliability in the first data is lower than a third threshold value, at least one of the reference pixel acquiring region and the weight is determined independently of the depth map.
- Embodiment(s) of the present invention can also be realized by a computer of a system or apparatus that reads out and executes computer executable instructions (e.g., one or more programs) recorded on a storage medium (which may also be referred to more fully as a 'non-transitory computer-readable storage medium') to perform the functions of one or more of the above-described embodiment(s) and/or that includes one or more circuits (e.g., application specific integrated circuit (ASIC)) for performing the functions of one or more of the above-described embodiment(s), and by a method performed by the computer of the system or apparatus by, for example, reading out and executing the computer executable instructions from the storage medium to perform the functions of one or more of the above-described embodiment(s) and/or controlling the one or more circuits to perform the functions of one or more of the above-described embodiment(s).
- computer executable instructions e.g., one or more programs
- a storage medium which may also be referred to more fully as
- the computer may comprise one or more processors (e.g., central processing unit (CPU), micro processing unit (MPU)) and may include a network of separate computers or separate processors to read out and execute the computer executable instructions.
- the computer executable instructions may be provided to the computer, for example, from a network or the storage medium.
- the storage medium may include, for example, one or more of a hard disk, a random-access memory (RAM), a read only memory (ROM), a storage of distributed computing systems, an optical disk (such as a compact disc (CD), digital versatile disc (DVD), or Blu-ray Disc (BD) TM ), a flash memory device, a memory card, and the like.
- an image processing method, an image processing apparatus, an image pickup apparatus, a program, and a storage medium which are capable of performing noise reduction of an image with high accuracy can be provided.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Human Computer Interaction (AREA)
- Image Processing (AREA)
- Studio Devices (AREA)
Abstract
An image processing method includes the steps of acquiring first data relating to a partial region including a target pixel from an input image, acquiring a plurality of second data relating to a plurality of partial regions, each including one of a plurality of reference pixels, determining a weight with respect to each of the plurality of second data depending on a correlation between the first data and each of the plurality of second data, and generating an output pixel corresponding to the target pixel based on the plurality of reference pixels and the weight, and at least one of the plurality of reference pixels and the weight is determined based on distance information of the input image.
Description
The present invention relates to an image processing method which performs noise reduction of an image.
Recently, along with the achievement of a higher definition of a display apparatus, the improvement of a quality of an image is required. In order to improve the quality of the image, it is important to reduce a noise from the image.
[PTL 1] US Patent No. 8427559
However, in the method of PTL 1, pixels having different structures from those of pixels near the target pixel are also used in the weighted averaging. The weights for the pixels are set to be small, but there are a large number of the pixels and therefore the influence on the weighted averaging cannot be ignored. Accordingly, a texture component of an image in which a high-frequency component is relatively weak is easily lost along with a noise. As described above, it is difficult to perform the noise reduction of an image with high accuracy.
The present invention provides an image processing method, an image processing apparatus, an image pickup apparatus, a program, and a storage medium which are capable of performing noise reduction of an image with high accuracy.
An image processing method as one aspect of the present invention includes the steps of acquiring first data relating to a partial region including a target pixel from an input image, acquiring a plurality of second data relating to a plurality of partial regions, each including one of a plurality of reference pixels, determining a weight with respect to each of the plurality of second data depending on a correlation between the first data and each of the plurality of second data, and generating an output pixel corresponding to the target pixel based on the plurality of reference pixels and the weight, and at least one of the plurality of reference pixels and the weight is determined based on distance information of the input image.
An image processing apparatus as another aspect of the present invention includes a storage device configured to store an input image, and an image processor configured to generate an output image based on the input image, and the image processor is configured to perform the image processing method.
An image pickup apparatus as another aspect of the present invention includes an image pickup element configured to photoelectrically convert an optical image formed via an optical system to output image data, and an image processor configured to generate an output image from an input image based on the image data, and the image processor is configured to perform the image processing method.
A program as another aspect of the present invention causes a computer to execute the image processing method.
A storage medium as another aspect of the present invention stores the program.
Further features and aspects of the present invention will become apparent from the following description of exemplary embodiments with reference to the attached drawings.
According to the present invention, an image processing method, an image processing apparatus, an image pickup apparatus, a program, and a storage medium which are capable of performing noise reduction of an image with high accuracy can be provided.
Exemplary embodiments of the present invention will be described below with reference to the accompanied drawings. In each of the drawings, the same elements will be denoted by the same reference numerals and the duplicate descriptions thereof will be omitted.
Before specific descriptions, an outline of noise reduction (noise reduction processing) in this embodiment will be described briefly. For the purpose of simplifying descriptions, it is assumed that a signal value of an image has one dimension (i.e., monochrome). When an input image is a color image having a multidimensional signal value, the following processing may be performed only for a certain dimensional component and it may be repeated for the other dimensional components similarly. In this embodiment, a term “pixel” relating to a certain image means a position, a signal value, or depth information (distance information) of a pixel.
First, a partial region including a target pixel on which the noise reduction is to be performed is extracted as target data (first data) from the input image. Next, a reference pixel acquiring region is set in the input image depending on the target pixel, and a plurality of reference pixels are selected from the reference pixel acquiring region. Furthermore, reference data (second data) as a partial region including each of the plurality of reference pixels are acquired, and a correlation value of the target data and each of the reference data is calculated. A weight of each of the reference data is determined depending on the correlation value. The weight is determined to increase with increasing the correlation, i.e., with increasing similarity of the reference data and the target data. Finally, with respect to the reference pixels in the reference data, a weighted average signal value is calculated by using the weight, and a signal value of the target pixel is replaced with the weighted average signal value, and thus the noise reduction (noise reduction processing) is finished. In this embodiment, a least one of the reference pixel acquiring region and the weight changes depending on a depth map of the input image. Furthermore, preferably, at least one of the reference pixel acquiring region and the weight changes depending on a color distribution of the input image. The color distribution can be acquired by using a multidimensional signal value of the input image.
This uses a fact that there is a high possibility that the reference data having a similar structure to that of the target data exist in an object where the target data is extracted. In other words, the depth map or the color distribution of the input image is used to acquire a region where there is a high possibility that the same object as that in the target data is included to emphasize the reference data in the region, and thus the influence of the reference data having a low similarity to the target data can be reduced. Accordingly, a noise reduction effect with a small loss of the texture, i.e., with high accuracy, can be achieved.
First, referring to FIGs. 1 and 2, an image pickup apparatus which is capable of performing an image processing method in Embodiment 1 of the present invention will be described. FIG. 1 is a block diagram of an image pickup apparatus 100 in this embodiment. FIG. 2 is an external view of the image pickup apparatus 100.
A parallax image acquirer 101 (image acquirer) includes a plurality of imaging optical systems 102a to 102c and a plurality of image pickup elements 103a to 103c. In this embodiment, actually, 16 combinations of the imaging optical system and the image pickup element are arranged in two dimensions of 4×4, but fourth and subsequent imaging optical systems and image pickup elements are omitted in FIG. 1. Each of the image pickup elements 103a to 103c includes a CCD (Charge Coupled Device), a CMOS (Complementary Metal-Oxide Semiconductor), or the like. In capturing an image, light entering the parallax image acquirer 101 is collected by the imaging optical systems 102a to 102c and then converted by the image pickup elements 103a to 103c to analog electric signals. In other words, each of the image pickup elements 103a to 103c photoelectrically converts an optical image formed via the optical system (corresponding imaging optical systems 102a to 102c) to output image data. An A/D converter 104 converts the analog electric signals (image data) output from the image pickup elements 103a to 103c into digital signals (input image based on the image data), and it outputs the digital signals to an image processor 105.
The image processor 105 acquires a depth map (distance information) of an object space and performs noise reduction processing in addition to predetermined processing to generate an output image from the input image. The detail of the noise reduction processing will be described below. The depth map acquired by the image processor 105 and optical information of the image pickup apparatus 100 are stored in a storage device 106 (memory). The optical information is information relating to a state of the parallax image acquirer 101 during capturing an image, and for example it is image capturing condition information such as a state of an aperture stop, a focus position, and a focal length. A state detector 111 can acquire the optical information from a system controller 109 or a controller 110. The image processed by the image processor 105 is stored in an image recording medium 108 (memory) in a predetermined format. In this case, the depth map and the optical information may be stored at the same time. The image stored in the image recording medium 108 may be read as an input image and the image processor 105 may perform the noise reduction processing in this embodiment on the input image. When the image stored in the image recording medium 108 is to be watched, the image is output to a display device 107 such as a liquid crystal display.
A series of controls described above is performed based on an instruction of the system controller 109. A mechanical drive of the parallax image acquirer 101 is performed by the controller 110 based on an instruction of the system controller 109.
Next, referring to FIG. 3, the configuration of the parallax image acquirer 101 will be described in detail. FIG. 3 is a schematic diagram of the parallax image acquirer 101 (imaging optical systems 102a to 102p). The parallax image acquirer 101 has a compound-eye configuration as illustrated in FIG. 3. The imaging optical systems 102a to 102p are arranged in two dimensions, and 16 corresponding image pickup elements (not illustrated) are arranged behind the respective imaging optical systems 102a to 102p. A single image pickup element may be provided if images (optical images) formed by the imaging optical systems 102a to 102p can be received. The image pickup elements corresponding to the respective imaging optical systems 102a to 102p may have the number of pixels different from each other.
The imaging optical systems 102a to 102p are categorized into a plurality of types having focal lengths different from each other. In this embodiment, the imaging optical systems 102a to 102d are wide-angle lenses, the imaging optical systems 102e to 102h are standard lenses, the imaging optical systems 102i to 102l are medium telephoto lenses, and the imaging optical systems 102m to 102p are telephoto lenses. In this embodiment, however, the type, the number, and the arrangement of the imaging optical systems are not limited thereto. The parallax image acquirer 101 is not limited to the compound-eye configuration, and for example the configuration of the Plenoptic camera as described in Embodiment 2 below may be adopted. A single-viewpoint image acquirer may be provided if the image pickup apparatus 100 can acquire the depth map of the object space without using the parallax images. As an example of acquiring the depth map of the object space without using the parallax images, a TOF (Time of Flight) method or a structured illumination may be used.
Next, referring to FIGs. 4 to 6A and 6B, the noise reduction processing in this embodiment will be described. FIG. 4 is a flowchart of the noise reduction processing. FIG. 5 is an explanatory diagram of the input image. FIGs. 6A and 6B are explanatory diagrams of the depth map. Each step in FIG. 4 is performed mainly by the image processor 105 based on an instruction of the system controller 109.
First, at step S101, the image processor 105 acquires an input image (captured image) and a depth map (distance information of an object space) relating to the input image. The input image is an image for which the noise reduction is to be performed. The input image may be any one of a single-viewpoint image, a plurality of parallax images, and a synthesized image of them, which are acquired by the parallax image acquirer 101. In this embodiment, since the parallax information of the object space is acquired by the parallax image acquirer 101, the depth map is acquired by using a stereo method or the like. In this case, the depth map can be estimated only for an edge region in an image where a feature point exists, and a depth in a non-edge region such as a gradation can be acquired by interpolation based on a depth in the edge region.
Subsequently, at step S102, the image processor 105 acquires, from the input image, a pixel (target pixel) for which the noise reduction is to be performed and target data (first data) relating to a partial region including the target pixel in the input image. As illustrated in FIG. 5, a target pixel 201a and target data 203a are acquired from an input image 200. However, a position, a size, and a shape of each of the target pixel 201a and the target data 203a are not limited thereto. The target pixel 201a acquired at once may be a single pixel or a plurality of pixels. On the other hand, the target data 203a need to have information relating to a structure (i.e., distribution of signal values), and accordingly they are data relating to a partial region including a plurality of pixels. When the target pixel 201a includes the plurality of pixels, the target data 203a may coincide with the target pixel 201a.
Subsequently, at step S103, the image processor 105 determines a reference pixel acquiring region based on the depth map acquired at step S101. Then, the image processor 105, as described below, extracts a plurality of reference pixels from the reference pixel acquiring region and calculates a weighted average of signal values of the plurality of extracted reference pixels to reduce a noise of the target pixel. In this case, with increasing the number of pixels which are disposed near pixels having structures similar to the target data in the plurality of reference pixels, highly-accurate noise reduction can be performed while keeping the structure of the target pixel. There is a high possibility that the structure similar to that of the target data (in particular, a texture component) exists in an object where the target pixel exists. Furthermore, the same object region can be roughly specified in the input image by using the depth information (depth map, i.e., distance information in the object space). Accordingly, reference data (second data) having a structure similar to that of the target data can be effectively acquired by restricting the reference pixel acquiring region with the use of the depth map.
The reference data are a partial region including the reference pixel, and they correspond to reference data 205a and 205b in FIG. 6A. The reference data will be described in detail when explaining step S104 in FIG. 4. The reference pixel acquiring region is determined based on the depth map, but this embodiment is not limited thereto. For example, the same object region can be roughly acquired similarly based on a color distribution (color distribution information) of the input image. This will be described in Embodiment 2 in detail.
Next, a method of restricting (method of determining) the reference pixel acquiring region based on the depth map will be described. A first method is a method of setting a depth threshold value (first threshold value) which is a threshold value relating to the depth to eliminate a region which has a depth apart from the depth of the target data by not less than the depth threshold value. Referring to FIGs. 6A and 6B, this will be described. Each of FIGs. 6A and 6B is a depth map (distance information) of the input image 200 illustrated in FIG. 5, and shading indicates a value of the depth (i.e., the color is darker with increasing the value of the depth, that is, with increasing a distance). In FIGs. 6A and 6B, for the purpose of simplifying the explanation, outlines of objects (three objects of a circle, a triangle, and a rectangle) are indicated by solid lines, but in an actual depth map, these outlines do not exist. This is because the objects existing at the same depth position (three objects and a floor in FIGs. 6A and 6B) cannot be distinguished. However, the outline can be obtained if the color distribution of the input image or the like is used.
Hereinafter, a case in which the target pixel 201a in FIG. 6A is selected is considered. In FIG. 6A, a reference pixel acquiring region 204a indicated by a dashed line is set on conditions that a region in which a depth is apart from the depth of the target data 203a by a value more than the depth threshold value is excluded. However, a shape and a size of the reference pixel acquiring region 204a are not limited thereto. For example, an object may be cut out based on information relating to a depth or a color, and a whole of the object may be set as a reference pixel acquiring region. For example, the depth threshold value can be determined by creating a histogram for the depth of the input image and by using a mode method, an error minimization of the GMM (Gaussian Mixture Model), or the like. Alternatively, the depth threshold value can be determined based on optical information of the image pickup apparatus 100, and its details will be described below.
Subsequently, a second method as a method of restricting (method of determining) the reference pixel acquiring region based on the depth map will be described. Hereinafter, a case in which a depth in the target data is discontinuous (for example, an edge region) is considered. The determination as to whether the depth is discontinuous or not can be performed by setting a depth differential threshold value that is a threshold value relating to a differential of the depth, and by determining whether an absolute value of the differential of the depth in the target data exceeds the depth differential threshold value. When it is determined that the depth is discontinuous, the target data include an edge structure. Accordingly, the reference pixel acquiring region is similarly restricted to a region in which an edge exists. For example, as illustrated in FIG. 6B, it is assumed that the edge region in which the depth is discontinuous is selected as a target pixel 201b and target data 203b. In this case, a region which has a structure similar to that of a region near the target pixel 201b is only the edge region. Accordingly, the region in which the depth is discontinuous similarly is specified as a reference pixel acquiring region 204b. The method (first method) of restricting the reference pixel acquiring region based on the depth threshold value as described referring to FIG. 6A may be combined. In this embodiment, the two methods as methods of restricting (method of determining) the reference pixel acquiring region based on the depth map are described, but this embodiment is not limited thereto.
The values of the depths in the target data and in the reference data may be values of the depths of the target pixel and the reference pixel (depth average value of each pixel if each of the target pixel and the reference pixel includes a plurality of pixels). Alternatively, an average value of depths of all pixels in the target data or in the reference data may be adopted. In particular, when an accuracy of the depth map (depth reliability) is not high, it is preferred that a depth average value of all the pixels in each of the data is used in order to improve the accuracy. However, when the target data or the reference data are an edge region in which the depth is discontinuous, there is a high possibility that the average value of all the pixels in the data is a different depth. Accordingly, in order to reduce the influence, it is preferred that the depth is determined only by the target pixel or the reference pixel. For example, if it is assumed that the depth of the reference pixel 202d illustrated in FIG. 6B is the depth average value of all the pixels in the reference data 205d, values of depths on the background are mixed and accordingly there is a high possibility that the value is shifted from that of the object (rectangle).
Subsequently, at step S104 in FIG. 4, the image processor 105 acquires a plurality of reference pixels and reference data (second data) from the reference pixel acquiring region. For example, reference pixels 202a and 202b and reference data 205a and 205b illustrated in FIG. 6A or reference pixels 202c and 202d and reference data 205c and 205d (third or subsequent pixels or data are omitted) illustrated in FIG. 6B are acquired. In this embodiment, a size and a shape of each of the reference pixels and the reference data are not limited thereto. The reference pixel and the reference data do not have to coincide with the target pixel and the target data, respectively. This is because the numbers of both pixels can coincide with each other by a size conversion described below. However, the reference data need to include information relating to a distribution of signals, and accordingly they are data relating to a plurality of pixels. When the target pixel is selected from a certain color component (for example, Green) in the input image, the reference pixel and the reference data may be acquired from another color component (Red or Blue).
Subsequently, at step S105, the image processor 105 calculates a correlation value of the target data and the reference data. The correlation value can be calculated by using a feature-based method such as SIFT (Scale-Invariant Feature Transform) and SURF (Speeded-Up Robust Features) or by using a region-based method described below. The feature-based method focuses on a feature amount, and accordingly the correlation value can be calculated even when the numbers of pixels in the target data and the reference data are different from each other. On the other hand, the region-based method focuses on a difference of signal values, it is necessary to match the numbers of both pixels in order to calculate the correlation exactly. However, the calculation of the correlation using the region-based method can determine similarity with high accuracy compared to the calculation using the feature-based method, and accordingly it is preferred that the region-based method is used.
Hereinafter, with respect to correlation calculation expressions by the region-based method, two examples will be presented. However, this embodiment is not limited thereto. In the following explanations, for the purpose of simplifying the explanations, the expressions are described by using a single signal without considering color components (RGB), but the expressions can be applied also to a case in which a plurality of color components are included.
A first correlation calculation expression uses a root-mean-square of signal values in the target data and the reference data. When the target data and the reference data are treated as a partial region of an image, i.e., a matrix, a correlation calculation expression g1 (first correlation calculation expression) is represented by the following expression (1).
In expression (1), symbol T is a matrix including components Tij of signal values of respective pixels in the target data, symbol N is the number of lines in the matrix T, symbol M is the number of rows in the matrix T, and symbol Rk is a matrix including components of signal values of respective pixels in k-th reference data. Symbol P satisfies the following expression (2), and symbol Pij denotes components of P.
In expression (2), symbol NRk is the number of lines in the matrix Rk, symbol MRk is the number of rows in the matrix Rk. Symbol σ(Rk,N/NRk,M/MRk) represents a conversion (magnification or reduction of an image) in which the number of lines in the matrix Rk is magnified by N/NRk and the number of rows in the matrix Rk is magnified by M/MRk. For the conversion σ, bilinear interpolation, bicubic interpolation, or the like may be used.
When each of the target data and the reference data is treated as a vector including a component of each signal value, expression (1) is rewriten as represented by the following expression (3).
In expression (3), symbol t is a vector including components ti of the signal values in the target data, symbol rk is a vector including components of signal values in the k-th reference data, symbol ρ is a vector in which components of the matrix P are rearranged in one dimension, and components of the vector ρ is ρi.
Each of the correlation calculation expressions represented by expressions (1) and (3) is an expression relating to a difference between the target data and the reference data, and accordingly, it means that the similarity of both data is higher as the value is closer to zero.
In this embodiment, a DC (direct-current) component (i.e., average value corresponding to a brightness of an image) may be subtracted from a signal in each of the target data and the reference data. The correlation calculation determines the similarity level of the structures of the target data and the reference data, and therefore the brightness (direct-current component) is irrelevant to the correlation. A contrast in the reference data may be adjusted such that the correlation of them is maximized. This corresponds to multiplication of an AC (alternating-current) component by a scalar. In this case, expression (1) is rewritten as represented by the following expression (4).
In expression (4), symbols Tave and Pave are average values of signal values in the matrixes T and P, respectively, and these average values may be calculated by a uniform weight or alternatively weighted averaging may be adopted. Symbol c is a coefficient which adjusts the contrast, and it is represented by the following expression (5) based on the method of least squares.
When the correlation value is calculated by using expression (4), it is necessary to adjust the brightness and the contrast of the reference pixel similarly in calculating the weighted average at step S107 in FIG. 4.
Subsequently, a second correlation calculation expression as a region-based correlation calculation expression will be described. The second correlation calculation expression uses the SSIM (Structure Similarity), and it is represented by the following expression (6).
In expression (6), symbols L, C, and S are evaluation functions relating to a brightness, a contrast, and other structures, each of which indicates a value from 0 to 1. It means that two signals to be compared are close to each other as each value is closer to 1. Symbols α, β, and γ are parameters which adjust weights for the respective evaluation items. The correlation calculation is performed while eliminating the DC component (brightness) if α=0 is satisfied, and consideration of the multiplication of the AC component by a scalar (adjustment of the contrast) is not needed if β=0 is satisfied, and accordingly, it is possible to perform the evaluation similar to expression (4).
In this embodiment, when calculating the correlation value, a plurality of correlation calculation expressions may be combined. When performing the region-based correlation calculation, for example using the first correlation calculation expression or the second correlation calculation expression, isometric transformation may be performed for the reference data so that the correlation value of the target data and the reference data is maximized. The isometric transformation is for example identity transformation, rotation transformation, or inversion transformation. In this case, the transformation by which the correlation value is maximized is performed also for the reference pixel at step S107 in FIG. 4. By finding the reference data with higher similarity, an effect of the noise reduction can be improved. In particular, when both the target data and the reference data contain edge information, the similarity tends to be increased by performing the isometric transformation. In this case, however, a calculation amount increases, and accordingly it is preferred that whether the isometric transformation is to be performed is determined by comparing the effect of the noise reduction and the calculation amount.
Subsequently, at step S106 in FIG. 4, the image processor 105 determines a weight (weight coefficient) for each of the plurality of reference data based on the correlation value calculated at step S105. As the correlation increases, the reference data are more similar to the target data and accordingly the weight is set to be larger. For example, the weight is determined as represented by the following expression (7) by using expression (3).
In expression (7), symbol wk is a weight corresponding to the k-th reference data, and symbol h is a strength of a filter. Symbol Z is a normalization factor of the weight wk, which satisfies the following expression (8).
In this embodiment, however, the method of determining the weight is not limited thereto. For example, a table of weights corresponding to respective correlation values may be previously stored to determine the weight referring to this table.
Subsequently, at step S107, the image processor 105 calculates a weighted average of the signal values of the reference pixels by using the weight determined at step S106. Then, the image processor 105 replaces the signal value of the target pixel with the calculated weighted average (weighted average signal value). Thus, the noise reduction of the target pixel is completed. For example, a weight average signal value Save is calculated by the following expression (9).
In expression (9), symbol sk is a signal value of the reference pixel in the k-th reference data. When the number of each of the target pixels and the reference pixels is plural number, each of the signal value sk and the weighted average signal value save has a vector amount. However, the method of calculating the weighted average is not limited thereto, and other methods such as nonlinear coupling may be used.
When the DC component is subtracted and the contrast is adjusted during the correlation calculation at step S105, it is necessary to adjust the brightness and the contrast corresponding to the reference pixel before obtaining the weighted average. This is the same with respect to the size transformation in expression (2) or the isometric transformation. In this embodiment, the replacement processing is used for the noise reduction, and learning-type noise reduction processing can also be performed by using the weighted average signal value.
Subsequently, at step S108, the image processor 105 determined whether the processing on a predetermined region in the input image is finished. If the processing relating to all pixels on which the noise reduction is to be performed is not completed, the flow returns to step S102 and the image processor 105 selects a new target pixel. On the other hand, if the processing relating all the pixels on which the noise reduction is to be performed is completed, the flow is finished. According to the processing described above, a loss of a texture component caused by the noise reduction of an image can be reduced to perform the noise reduction with high accuracy
Next, a preferred condition that enhances the effect of this embodiment will be described. It is preferred that this embodiment includes a step of acquiring optical information of the image pickup apparatus 100 capturing the input image and that the depth threshold value used at step S103 is determined based on the optical information. Referring to FIGs. 7 to 9A-9C, this will be described. FIG. 7 is a schematic diagram of the image pickup apparatus 100 and the object space. FIGs. 8A to 8C are point spread functions (PSFs). FIGs. 9A to 9C are modulation transfer functions (MTFs).
In FIG. 7, x, y, and z represent x, y, and z axes in a three-dimensional coordinate, respectively, and the z axis indicates a depth direction. The image pickup apparatus 100 focuses on an in-focus plane 211. Hereinafter, an out-of-focus plane 212 will be considered. The point spread function of the image pickup apparatus 100 corresponding to the out-of-focus plane 212 is represented as illustrated in FIG. 8A. For the purpose of simplifying explanations, only a component of a cross section of the point spread function that satisfies y=0 is depicted. FIG. 9A is an MTF corresponding to the point spread function of FIG. 8A. Symbol fx is a spatial frequency in an x-axis direction, and only a positive quadrant is illustrated for simplicity. Symbol fmax illustrates a spatial frequency at which a value of the MTF is zero in FIG. 9A.
Since a blur caused by defocusing exists in the image pickup apparatus 100, on a plane apart from the in-focus plane 211 in the z direction, an acquirable maximum frequency decreases. For example, as illustrated in FIG. 8B, a point image on the out-of-focus plane 213 in FIG. 7 is captured with a blur more than that in FIG. 8A. FIG. 9B is an MTF corresponding to the point spread function of FIG. 8B. As can be seen in FIG. 9B, the acquirable maximum frequency is decreased compared to the case of FIG. 9A. An out-of-focus plane 214 in FIG. 7 is a plane further apart from the in-focus plane 211 compared to the out-of-focus plane 213. FIG. 8C is a point spread function on the out-of-focus plane 214, and FIG. 9C is an MTF of the point spread function of FIG. 8C.
As described above, in the image pickup apparatus 100, acquirable information (frequency component) varies depending on a depth. In other words, even when an object having a similar structure exists in each of the out-of- focus planes 212 and 214, an actually captured image is an image on which the point spread function of FIG. 8A or 8C is superimposed. Thus, since the acquirable frequency bands are different, the images have different structures on the input image.
Considering the above, even if the target data and the reference data are acquired between depths where the acquirable frequency bands are extremely different, there is a low possibility that they have similar structures. The acquirable frequency component in each depth can be calculated based on the optical information of the image pickup apparatus 100. Accordingly, in this embodiment, it is preferred that the reference pixel acquiring region is determined depending on the optical information. The optical information is a focusing distance, a focal length, an F number, an optical transfer function (OTF), a point spread function (PSF), and a spread amount of an image caused by an aberration, a diffraction, and defocusing, and the like. The acquirable frequency component can be exactly acquired if the OTF, the MTF, or the PSF in each depth is known, and it can be approximately obtained based on the focal length and the F number. When the input image is an image obtained by synthesizing a plurality of parallax images, for example the F number is determined based on an opening obtained by synthesizing each of an opening of the imaging optical system acquiring each parallax image, and the other optical information is also determined corresponding to the synthesis of the images.
Next, a method of determining the depth threshold value using the optical information will be described. In a first determination method, for example, a threshold value fthr is determined for the depth of the target data, and an edge where the MTF at the threshold value fthr is greater than or equal to a predetermined value rthr is defined as a depth threshold value. An example which determines the reference pixel acquiring region by using this method is a reference pixel acquiring depth range 215 in FIG. 7. The depth in which a spread of an image caused by defocusing is the same is different in front of and behind the in-focus plane 211, and accordingly the depth threshold values are also different in front of and behind the out-of-focus plane 212 as illustrated in FIG. 7. In this embodiment, the case in which the target data exist on the out-of-focus plane 212 is indicated, and similarly this embodiment is applied for the case in which the target data exist on the out-of-focus plane 211. In the method of determining the depth threshold value, a depth in which a higher frequency than the depth of the target data cannot be sufficiently acquired can be removed from the reference pixel acquiring region. Accordingly, this determination method is especially effective as the target data are acquired by a depth which is closer to the in-focus plane.
Subsequently, in a second method as a method of determining the depth threshold value, the depth threshold value is determined depending on whether the depth of the target data and the shape of the point spread function (or MTF, or the like) are similar. The similarity of the point spread function may be determined for example by the calculation using an expression similar to expression (1). By replacing an intensity of the point image with a signal value of a pixel, the similar calculation can be performed. An edge of the depth where the similarity satisfies a predetermined condition may be set as a depth threshold value. In the second determination method, the depth in which the acquirable frequency band is different from that of the depth of the target data (i.e., depth in which the frequency is insufficient or excessive) can be removed from the reference pixel acquiring region.
More preferably, the depth threshold value used at step S103 is determined by considering frequency characteristics of the target data in addition to the optical information of the image pickup apparatus 100. The structure of the target data depends on not only the point spread function in the depth of the target data but also a structure of an object. Hereinafter, a case in which the target data are acquired from the out-of-focus plane 212 in FIG. 7 will be considered.
FIGs. 10A and 10B are frequency characteristics of the target data in this case. FIG. 10A illustrates a case in which an object included in the target data has a fine structure, and FIG. 10B illustrates a case in which an object in the target data has a rough structure (only with a low frequency). The structure similar to the target data having the frequency characteristics of FIG. 10A exists only in an acquirable depth up to a high-frequency component (near the in-focus plane 211 or the out-of-focus plane 212). On the other hand, there is a possibility that the structure similar to the target data of FIG. 10B exists over a wide range of the depth. Accordingly, it is preferred that the depth threshold value changes depending on the frequency characteristics of the target data. For example, a frequency at which a spectral intensity is less than or equal to a predetermined value rthr is referred to as a threshold value fthr, and as described above, the first determination method of the depth threshold value using the optical information may be used. As a result, since the threshold value fthr of the frequency varies depending on the frequency of the target data, the depth threshold value also changes.
Preferably, this embodiment includes a step of acquiring a map of a depth reliability that represents an accuracy of the depth map of the input image, and processing is switched when determining the reference pixel acquiring region at step S103 depending on the depth reliability. In this embodiment, the depth map is calculated based on the parallax images, and accordingly the accuracy of estimation of the depth is decreased if for example the number of corresponding points of the parallax images is small. Similarly, even in the TOF method or a method of acquiring the depth using a structured illumination, the accuracy of acquisition may be decreased due to a disturbance or characteristics of an object surface. When these low-accuracy depths are used for the processing, the effect of this embodiment is reduced. Therefore, it is preferred that the processing is switched depending on the depth reliability. For example, a threshold value (second threshold value) relating to the depth reliability is set, and a region in which the depth reliability is lower than the second threshold value is removed from the reference pixel acquiring region. Accordingly, the possibility that the reference pixel having a different structure from that of a pixel near the target pixel is synthesized at step S107 can be decreased.
Preferably, this embodiment sets a threshold value (third threshold value) relating to a depth reliability different from the depth reliability described above, and the reference pixel acquiring region is determined without using the depth map at step S103 when the depth reliability of the target data is lower than the third threshold value. If the reference pixel acquiring region is determined by using the depth when the depth reliability of the target data is low, there is a possibility that the reference data are acquired only from an object different from the target data. Accordingly, when the reliability is lower than the third threshold value, the reference pixel acquiring region is determined independently of depth information, and as a result the acquisition of only the reference data with a similarity lower than that of the target data can be avoided.
When the depth is calculated based on the parallax images, the depth reliability may be defined such that the reliability increases in a region where there are a lot of corresponding points of the parallax images or there is an edge with strong intensity. This is because the accuracy of calculating the depth is improved in the region where there are many corresponding points or there is a strong edge portion. In this configuration, according to this embodiment, an image pickup apparatus which is capable of performing highly-accurate noise reduction of an image can be provided.
Next, referring to FIGs. 11 and 12, an image processing system capable of performing an image processing method in Embodiment 2 of the present invention will be described. FIG. 11 is a block diagram of an image processing system 300 in this embodiment. FIG. 12 is an external view of the image processing system 300. The image processing system 300 in this embodiment is provided with an image pickup apparatus and an image processing apparatus which performs noise reduction processing in this embodiment separately from each other, and color information is used to restrict a reference pixel acquiring region in the noise reduction processing and depth information is used in calculating a weight of reference data.
The input image acquired by an image pickup apparatus 301 is output to an image processing apparatus 302 via a communication device 303. The image pickup apparatus 301 is capable of acquiring a parallax image, and a storage device 304 (memory) stores a depth map acquired from parallax information and optical information determined at the time of capturing the input image. A noise reduction device 305 (image processor) performs noise reduction processing (image processing method) on the input image to generate an output image based on the input image. The output image processed by the noise reduction device 305 is output to at least one of a display apparatus 306, a recording medium 307, and an output device 308 via the communication device 303. The display apparatus 306 is for example a liquid crystal display or a projector. A user can work while confirming the image under processing through the display apparatus 306. A recording medium 307 is for example a semiconductor memory, a hard disk, or a server on a network. The output device 308 is for example a printer. The image processing apparatus 302 has a function that performs development processing and other image processing as needed.
Next, referring to FIG. 13, the configuration of a parallax image acquirer in the image pickup apparatus 301 will be described. FIG. 13 is a schematic diagram of the parallax image acquirer in the image pickup apparatus 301. The parallax image acquirer includes an imaging optical system 301a, a lens array 301b, and an image pickup element 301c.
The lens array 301b is disposed on a plane conjugate to an in-focus plane 311 via the imaging optical system 301a. The lens array 301b is configured so that an exit pupil of the imaging optical system 301a and the image pickup element 301c approximately have a conjugate relation. Light rays from an object space pass through the imaging optical system 301a and the lens array 301b, and they enter different pixels from each other of the image pickup element 301c depending on a pupil region (i.e., viewpoint) of the imaging optical system 301a through which the light rays pass. Thus, the parallax images are acquired. In FIG. 13, a viewpoint is divided into five viewpoints in one-dimensional direction, and accordingly 25 parallax images are acquired in two dimensions. However, the number of the viewpoints is not limited thereto.
The configuration illustrated in FIG. 13 is called a Plenoptic 1.0 camera, which is described in Japanese Patent No. 4752031 in detail. As a Plenoptic camera capable of obtaining parallax images, for example, other configurations as disclosed in US Patent No. 7962033 may be adopted. An object does not need to exist on the in-focus plane 311 (i.e., focusing may be performed on a space where nothing exists). This is because a focus position control can be performed after capturing images, which is called refocusing, by synthesizing the parallax images.
Next, referring to FIGs. 14 and 15, the noise reduction processing which is performed by the noise reduction device 305 illustrated in FIG. 11 will be described in detail. FIG. 14 is a flowchart of the noise reduction processing in this embodiment. FIG. 15 is an explanatory diagram of the depth map in this embodiment. Each step of FIG. 14 is performed mainly by the noise reduction device 305 based on an instruction of a system controller (not illustrated) included in the image processing apparatus 302. In FIG. 14, the same descriptions as those described in Embodiment 1 referring to FIG. 4 are omitted.
Steps S201 and S202 are the same as steps S101 and S102 in FIG. 4, respectively. Subsequently, at step S203, the noise reduction device 305 determines the reference pixel acquiring region based on color distribution (color distribution information) of the input image.
In the depth map illustrated in FIG. 15, a reference pixel acquiring region 404 indicated by a dashed line is set as a region which is located within a range of the predetermined number of pixels in a horizontal direction and in a vertical direction with reference to a target pixel 401 and has similar color information to that in a target data 403 (in this embodiment, a circle and a triangle are objects with similar colors). However, the method of determining the reference pixel acquiring region 404 is not limited thereto. By restricting the reference pixel acquiring region 404 to a region which has a similar color, acquisition of the reference pixel from an object having a different structure can be avoided. For dividing regions based on colors, for example K-means clustering can be used. Alternatively, a color space may be previously divided into some groups, and the reference pixel acquiring region 404 may be determined depending on the group to which each pixel of the input image belongs.
Subsequently, steps S204 and S205 in FIG. 14 are the same as steps S104 and S105 in FIG. 4, respectively. For example, as illustrated in FIG. 15, the noise reduction device 305 acquires reference pixels 402a and 402b and reference data 405a and 405b (third or subsequent pixels or data are omitted) to calculate a correlation value of target data 403 and the reference data.
Subsequently, at step S206, the noise reduction device 305 determines the weight of the reference data by using the depth map and the correlation value calculated at step S205. There is a high possibility that the reference data existing in the same object as that of the target data have a similar structure. Accordingly, the weight is increased as the depths of the target data and the reference data are closer to each other. For example, in FIG. 15, the weight of the reference pixel 402a existing on a depth similar to that of the target pixel 401 is increased, and on the other hand the weight of the reference pixel 402b which has a different depth from that of the target pixel 401 is decreased.
In expression (10), symbol vk is a weight corresponding to the k-th reference data, symbol D0 is a depth of the target data, symbol Dk is a depth of the k-th reference data, and symbol d is a scaling parameter of the depth. Symbol Ω1 is a normalization factor of the weight vk, which satisfies the following expression (11).
In this embodiment, however, the method of determining the weight is not limited thereto. Furthermore, the weight may be determined by using color information, as well as the depth. In this case, expression (10) is rewritten as represented by the following expression (12).
In expression (12), symbol uk is a weight corresponding to the k-th reference data, symbol Γ0k is a distance between averaged pixel values in the target data and the k-th reference data in a color space, and symbol γ is a scaling parameter of the distance. Symbol Ω2 is a normalization factor of the weight uk, which satisfies the following expression (13).
Steps S207 and S208 are the same as steps S107 and S108 in FIG. 4, respectively. According to the processing described above, a loss of a texture component caused by noise reduction of an image can be decreased to perform highly-accurate noise reduction.
Next, a preferred condition that enhances the effect of this embodiment will be described. First, at step S206, the noise reduction device 305 acquires information relating to a resolution limit of the reference data based on the optical information of the image pickup apparatus 301 capturing the input image and the depth map. Then, it is preferred that the noise reduction device 305 changes the weight depending on the information relating to the acquired resolution limit and the frequency characteristics of the reference data. As described in Embodiment 1, the input image is blurred due to the aberration, the diffraction, or the defocusing, and accordingly there is a spatial frequency (resolution limit) as a limit of the resolution in the input image. Therefore, if there is a higher frequency component than the resolution limit, it is a noise component.
If optical characteristics of the image pickup apparatus 301 and the depth map of the object space are known, the resolution limit can be calculated in each region of the input image. Therefore, the resolution limit of the reference data is calculated to be compared with the frequency characteristics of the reference data, and thus a part of a generated noise amount can be evaluated. It is believed that the reference data in which the MTF is large at a higher frequency than the resolution limit contain a large amount of noise, and therefore it is preferred that the weight is decreased. Accordingly, the effect of the noise reduction can be further improved.
Similarly to Embodiment 1, it is preferred that a map of a depth reliability is acquired and the weight is set to be smaller as the depth reliability of the reference data is lower. Thus, an effect of the noise reduction in this embodiment can be achieved with higher accuracy. Similarly to Embodiment 1, when the depth reliability of the target data is low, it is preferred that the weight is determined independently of the depth. According to the configuration described above, an image processing system which is capable of performing highly-accurate noise reduction of an image can be provided.
Next, referring to FIGs. 16 and 17, an image pickup system which is capable of performing an image processing method in Embodiment 3 of the present invention will be described. FIG. 16 is a block diagram of an image pickup system 500 in this embodiment. FIG. 17 is an external view of the image pickup system 500. In this embodiment, an image pickup apparatus is connected to a server via a wireless or wired network, and an image processor in the server is capable of performing noise reduction processing on an image transferred from the image pickup apparatus to the server.
An image pickup apparatus 501 includes a TOF (Time of Flight) image pickup element, and it is capable of acquiring an input image and a depth map (distance information) of the input image by photographing. A server 503 (image processing apparatus) includes a communication device 504 (communication circuit), and it is connected to the image pickup apparatus 501 via a wireless or wired network 502. When the image pickup apparatus 501 captures an image, the input image (captured image) and the depth map are automatically or manually input to the server 503, and a storage device 505 (memory or memory circuit) stores the input image and the depth map. In this case, if needed, optical information of the image pickup apparatus 501 is also stored in the storage device 505. An image processor 506 (image processing circuit) performs noise reduction processing (image processing method) on the input image to generate an output image based on the input image. The processed output image is output to the image pickup apparatus 501 or stored in the storage device 505. The image processing method (noise reduction processing) in this embodiment is the same as that of Embodiment 1 or Embodiment 2 described referring to FIG. 4 or FIG. 14, and accordingly descriptions thereof are omitted.
As described above, the image processing method in each embodiment acquires first data (target data) relating to a partial region including a target pixel from an input image (S102, S202). Subsequently, the method acquires a plurality of second data (a plurality of reference data) relating to a plurality of partial regions, each including one of a plurality of reference pixels (S104, S204). Then, the method determines a weight with respect to each of the plurality of second data depending on a correlation between the first data and each of the plurality of second data (S106, S206), and generates an output pixel (output image including the output pixel) corresponding to the target pixel based on the plurality of reference pixels and the weight (S107, S207). In this case, at least one of the plurality of reference pixels and the weight is determined based on distance information of the input image. For example, a part of steps in each embodiment can be excluded, or at least a part of steps in Embodiments 1 and 2 can be combined.
Preferably, the image processing method determines a reference pixel acquiring region depending on the target pixel (S103, S203), and the plurality of reference pixels are selected from the reference pixel acquiring region. Preferably, at least one of the plurality of reference pixels and the weight is determined based on color distribution information of the input image. Preferably, when generating the output pixel, a signal value of the target pixel is replaced with a signal value calculated based on signal values of the plurality of reference pixels and the weight to generate the output pixel. Preferably, the distance information of the input image is a depth map of an object space in the input image.
Preferably, when determining the reference pixel acquiring region (S103), the reference pixel acquiring region is determined so as not to include a region having a depth different from a depth in the first data by at least a first threshold value. More preferably, the first threshold value is determined depending on optical information of an image pickup apparatus capturing the input image. More preferably, the optical information contains a modulation transfer function (MTF) or a point spread function (PSF) of the image pickup apparatus. More preferably, the first threshold value is determined depending on frequency characteristics of the first data. More preferably, when determining the reference pixel acquiring region (S103), the reference pixel acquiring region varies depending on a differential value (edge region) of a depth in the first data.
Preferably, when determining the weight (S206), the weight decreases with increasing a difference between a depth in the first data and a depth in the second data. Preferably, when determining the weight (S206), information relating to a resolution limit of the second data is acquired based on the depth map and optical information of the image pickup apparatus capturing the input image, and the weight changes depending on the information relating to the resolution limit and frequency characteristics of the second data.
Preferably, when determining at least one of the reference pixel acquiring region or the weight, whether the depth map is to be considered or not is determined depending on a depth reliability relating to an accuracy of the depth map. More preferably, when determining the reference pixel acquiring region (S103), the reference pixel acquiring region is determined so as not to include a region which has the depth reliability lower than a second threshold value. More preferably, when determining the weight (S206), the weight decreases with decreasing the depth reliability in the second data. More preferably, when the depth reliability in the first data is lower than a third threshold value, at least one of the reference pixel acquiring region and the weight is determined independently of the depth map.
Embodiment(s) of the present invention can also be realized by a computer of a system or apparatus that reads out and executes computer executable instructions (e.g., one or more programs) recorded on a storage medium (which may also be referred to more fully as a 'non-transitory computer-readable storage medium') to perform the functions of one or more of the above-described embodiment(s) and/or that includes one or more circuits (e.g., application specific integrated circuit (ASIC)) for performing the functions of one or more of the above-described embodiment(s), and by a method performed by the computer of the system or apparatus by, for example, reading out and executing the computer executable instructions from the storage medium to perform the functions of one or more of the above-described embodiment(s) and/or controlling the one or more circuits to perform the functions of one or more of the above-described embodiment(s). The computer may comprise one or more processors (e.g., central processing unit (CPU), micro processing unit (MPU)) and may include a network of separate computers or separate processors to read out and execute the computer executable instructions. The computer executable instructions may be provided to the computer, for example, from a network or the storage medium. The storage medium may include, for example, one or more of a hard disk, a random-access memory (RAM), a read only memory (ROM), a storage of distributed computing systems, an optical disk (such as a compact disc (CD), digital versatile disc (DVD), or Blu-ray Disc (BD)TM), a flash memory device, a memory card, and the like.
According to each embodiment, an image processing method, an image processing apparatus, an image pickup apparatus, a program, and a storage medium which are capable of performing noise reduction of an image with high accuracy can be provided.
While the present invention has been described with reference to exemplary embodiments, it is to be understood that the invention is not limited to the disclosed exemplary embodiments. The scope of the following claims is to be accorded the broadest interpretation so as to encompass all such modifications and equivalent structures and functions.
200 INPUT IMAGE
201a, 201b TARGET PIXEL
202a-202d REFERENCE PIXEL
203a, 203b TARGET DATA (FIRST DATA)
204a, 204b REFERENCE PIXEL ACQUIRING REGION
205a-205d REFERENCE DATA (SECOND DATA)
201a, 201b TARGET PIXEL
202a-202d REFERENCE PIXEL
203a, 203b TARGET DATA (FIRST DATA)
204a, 204b REFERENCE PIXEL ACQUIRING REGION
205a-205d REFERENCE DATA (SECOND DATA)
Claims (20)
- An image processing method comprising the steps of:
acquiring first data relating to a partial region including a target pixel from an input image;
acquiring a plurality of second data relating to a plurality of partial regions, each including one of a plurality of reference pixels;
determining a weight with respect to each of the plurality of second data depending on a correlation between the first data and each of the plurality of second data; and
generating an output pixel corresponding to the target pixel based on the plurality of reference pixels and the weight,
wherein at least one of the plurality of reference pixels and the weight is determined based on distance information of the input image. - The image processing method according to claim 1, further comprising the step of determining a reference pixel acquiring region depending on the target pixel,
wherein the plurality of reference pixels are selected from the reference pixel acquiring region. - The image processing method according to claim 1 or 2, wherein at least one of the plurality of reference pixels and the weight is determined based on color distribution information of the input image.
- The image processing method according to any one of claims 1 to 3, wherein the step of generating the output pixel includes replacing a signal value of the target pixel with a signal value calculated based on signal values of the plurality of reference pixels and the weight to generate the output pixel.
- The image processing method according to any one of claims 1 to 4, wherein the distance information of the input image is a depth map of an object space in the input image.
- The image processing method according to claim 2, wherein the step of determining the reference pixel acquiring region includes determining the reference pixel acquiring region so as not to include a region having a depth different from a depth in the first data by at least a first threshold value.
- The image processing method according to claim 6, wherein the first threshold value is determined depending on optical information of an image pickup apparatus capturing the input image.
- The image processing method according to claim 7, wherein the optical information contains a modulation transfer function or a point spread function of the image pickup apparatus.
- The image processing method according to any one of claims 6 to 8, wherein the first threshold value is determined depending on frequency characteristics of the first data.
- The image processing method according to claim 2, wherein the step of determining the reference pixel acquiring region includes varying the reference pixel acquiring region depending on a differential value of a depth in the first data.
- The image processing method according to claim 5, wherein the step of determining the weight includes decreasing the weight with increasing a difference between a depth in the first data and a depth in the second data.
- The image processing method according to claim 5, wherein the step of determining the weight includes:
acquiring information relating to a resolution limit of the second data based on the depth map and optical information of an image pickup apparatus capturing the input image, and
changing the weight depending on the information relating to the resolution limit and frequency characteristics of the second data. - The image processing method according to claim 5, wherein when determining at least one of the plurality of reference pixels or the weight, whether the depth map is to be considered or not is determined depending on a depth reliability relating to an accuracy of the depth map.
- The image processing method according to claim 2, wherein the step of determining the reference pixel acquiring region includes determining the reference pixel acquiring region so as not to include a region which has a depth reliability lower than a second threshold value.
- The image processing method according to claim 13, wherein the step of determining the weight includes decreasing the weight with decreasing the depth reliability in the second data.
- The image processing method according to claim 13, wherein when the depth reliability in the first data is lower than a third threshold value, at least one of the plurality of reference pixels and the weight is determined independently of the depth map.
- An image processing apparatus comprising:
a storage device configured to store an input image; and
an image processor configured to generate an output image based on the input image,
wherein the image processor is configured to:
acquire first data relating to a partial region including a target pixel from the input image,
acquire a plurality of second data relating to a plurality of partial regions, each including one of a plurality of reference pixels,
determine a weight with respect to each of the plurality of second data depending on a correlation between the first data and each of the plurality of second data, and
generate the output image including an output pixel corresponding to the target pixel based on the plurality of reference pixels and the weight, and
wherein at least one of the plurality of reference pixels and the weight is determined based on distance information of the input image. - An image pickup apparatus comprising:
an image pickup element configured to photoelectrically convert an optical image formed via an optical system to output image data; and
an image processor configured to generate an output image from an input image based on the image data,
wherein the image processor is configured to:
acquire first data relating to a partial region including a target pixel from the input image,
acquire a plurality of second data relating to a plurality of partial regions, each including one of a plurality of reference pixels,
determine a weight with respect to each of the plurality of second data depending on a correlation between the first data and each of the plurality of second data, and
generate the output image including an output pixel corresponding to the target pixel based on the plurality of reference pixels and the weight, and
wherein at least one of the plurality of reference pixels and the weight is determined based on distance information of the input image. - A program which causes a computer to execute a process comprising the steps of:
acquiring first data relating to a partial region including a target pixel from an input image;
acquiring a plurality of second data relating to a plurality of partial regions, each including one of a plurality of reference pixels;
determining a weight with respect to each of the plurality of second data depending on a correlation between the first data and each of the plurality of second data; and
generating an output pixel corresponding to the target pixel based on the plurality of reference pixels and the weight,
wherein at least one of the plurality of reference pixels and the weight is determined based on distance information of the input image. - A storage medium which stores the program according to claim 19.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP2015006328A JP6624785B2 (en) | 2015-01-16 | 2015-01-16 | Image processing method, image processing device, imaging device, program, and storage medium |
| JP2015-006328 | 2015-01-16 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2016113805A1 true WO2016113805A1 (en) | 2016-07-21 |
Family
ID=56405374
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/JP2015/006324 Ceased WO2016113805A1 (en) | 2015-01-16 | 2015-12-18 | Image processing method, image processing apparatus, image pickup apparatus, program, and storage medium |
Country Status (2)
| Country | Link |
|---|---|
| JP (1) | JP6624785B2 (en) |
| WO (1) | WO2016113805A1 (en) |
Cited By (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110795988A (en) * | 2018-08-02 | 2020-02-14 | 三星电子株式会社 | Method and apparatus for processing data corresponding to fingerprint images |
| CN111292233A (en) * | 2018-12-06 | 2020-06-16 | 成都微晶景泰科技有限公司 | Lens array image splicing method and device and storage medium |
| CN115330640A (en) * | 2022-10-11 | 2022-11-11 | 腾讯科技(深圳)有限公司 | Illumination mapping noise reduction method, device, equipment and medium |
| CN115980135A (en) * | 2022-12-05 | 2023-04-18 | 中国石油天然气集团有限公司 | Method and device for determining lithofacies type of carbonate rock |
Families Citing this family (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN108510536B (en) * | 2017-02-28 | 2021-09-21 | 富士通株式会社 | Depth estimation method and depth estimation apparatus for multi-viewpoint image |
Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20120268623A1 (en) * | 2004-05-05 | 2012-10-25 | Centre National De La Recherche Scientifique-Cnrs | Image data processing method by reducing image noise, and camera integrating means for implementing said method |
| JP2014112783A (en) * | 2012-12-05 | 2014-06-19 | Canon Inc | Image processing device, image processing method and program, and image pickup device provided with image processing device |
-
2015
- 2015-01-16 JP JP2015006328A patent/JP6624785B2/en active Active
- 2015-12-18 WO PCT/JP2015/006324 patent/WO2016113805A1/en not_active Ceased
Patent Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20120268623A1 (en) * | 2004-05-05 | 2012-10-25 | Centre National De La Recherche Scientifique-Cnrs | Image data processing method by reducing image noise, and camera integrating means for implementing said method |
| JP2014112783A (en) * | 2012-12-05 | 2014-06-19 | Canon Inc | Image processing device, image processing method and program, and image pickup device provided with image processing device |
Cited By (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110795988A (en) * | 2018-08-02 | 2020-02-14 | 三星电子株式会社 | Method and apparatus for processing data corresponding to fingerprint images |
| CN111292233A (en) * | 2018-12-06 | 2020-06-16 | 成都微晶景泰科技有限公司 | Lens array image splicing method and device and storage medium |
| CN111292233B (en) * | 2018-12-06 | 2023-08-15 | 成都微晶景泰科技有限公司 | Lens array image stitching method, device and storage medium |
| CN115330640A (en) * | 2022-10-11 | 2022-11-11 | 腾讯科技(深圳)有限公司 | Illumination mapping noise reduction method, device, equipment and medium |
| CN115980135A (en) * | 2022-12-05 | 2023-04-18 | 中国石油天然气集团有限公司 | Method and device for determining lithofacies type of carbonate rock |
Also Published As
| Publication number | Publication date |
|---|---|
| JP2016134661A (en) | 2016-07-25 |
| JP6624785B2 (en) | 2019-12-25 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US9607240B2 (en) | Image processing apparatus, image capturing apparatus, image processing method, image capturing method, and non-transitory computer-readable medium for focus bracketing | |
| CN107977940B (en) | Background blurring processing method, device and equipment | |
| CN108055452B (en) | Image processing method, device and equipment | |
| WO2019105154A1 (en) | Image processing method, apparatus and device | |
| WO2019085792A1 (en) | Image processing method and device, readable storage medium and electronic device | |
| US9992478B2 (en) | Image processing apparatus, image pickup apparatus, image processing method, and non-transitory computer-readable storage medium for synthesizing images | |
| JP5968073B2 (en) | Image processing apparatus, imaging apparatus, image processing method, and image processing program | |
| US10325356B2 (en) | Image processing device, image processing method, imaging device, and recording medium for exclusively performing adjustment processing or viewpoint change processing | |
| CN108024054A (en) | Image processing method, device and equipment | |
| CN108154514A (en) | Image processing method, device and equipment | |
| JP2015197745A (en) | Image processing apparatus, imaging apparatus, image processing method, and program | |
| JP6818463B2 (en) | Image processing equipment, image processing methods and programs | |
| WO2016113805A1 (en) | Image processing method, image processing apparatus, image pickup apparatus, program, and storage medium | |
| JP6370348B2 (en) | Image processing apparatus, image processing method, imaging apparatus, program, and storage medium | |
| US10217193B2 (en) | Image processing apparatus, image capturing apparatus, and storage medium that stores image processing program | |
| JP5619124B2 (en) | Image processing apparatus, imaging apparatus, image processing program, and image processing method | |
| US20140184853A1 (en) | Image processing apparatus, image processing method, and image processing program | |
| JP6757407B2 (en) | Image processing equipment, image processing method and image processing program | |
| KR20150032764A (en) | Method and image capturing device for generating artificially defocused blurred image | |
| JP6949494B2 (en) | Image processing equipment and image processing methods, imaging equipment, programs | |
| US10326951B2 (en) | Image processing apparatus, image processing method, image capturing apparatus and image processing program | |
| JP6938282B2 (en) | Image processing equipment, image processing methods and programs | |
| US9710897B2 (en) | Image processing apparatus, image processing method, and recording medium | |
| JP2017182668A (en) | Data processing apparatus, imaging apparatus, and data processing method | |
| JP2016201600A (en) | Image processing apparatus, imaging apparatus, image processing method, image processing program, and storage medium |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 15877761 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 15877761 Country of ref document: EP Kind code of ref document: A1 |