[go: up one dir, main page]

US20080002890A1 - Method, apparatus, and program for human figure region extraction - Google Patents

Method, apparatus, and program for human figure region extraction Download PDF

Info

Publication number
US20080002890A1
US20080002890A1 US11/819,465 US81946507A US2008002890A1 US 20080002890 A1 US20080002890 A1 US 20080002890A1 US 81946507 A US81946507 A US 81946507A US 2008002890 A1 US2008002890 A1 US 2008002890A1
Authority
US
United States
Prior art keywords
region
human
estimated
face
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US11/819,465
Other versions
US8041081B2 (en
Inventor
Yi Hu
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fujifilm Corp
Original Assignee
Fujifilm Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fujifilm Corp filed Critical Fujifilm Corp
Assigned to FUJIFILM CORPORATION reassignment FUJIFILM CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: HU, YI
Publication of US20080002890A1 publication Critical patent/US20080002890A1/en
Application granted granted Critical
Publication of US8041081B2 publication Critical patent/US8041081B2/en
Active legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • G06V40/162Detection; Localisation; Normalisation using pixel segmentation or colour matching
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • G06V40/165Detection; Localisation; Normalisation using facial parts and geometric relationships

Definitions

  • the present invention relates to a method and an apparatus for extracting a human figure region in an image.
  • the present invention also relates to a program that causes a computer to execute the method.
  • an automatic human figure region extraction method has been proposed in G. Mori et al., “Recovering Human Body Configurations: Combining Segmentation and Recognition”, CVPR, pp. 1-8, 2004.
  • a whole image is subjected to region segmentation processing and judgment is made on each region as to whether the region is a portion of a human figure region based on characteristics such as the shape, the color, and texture thereof.
  • An assembly of the regions having been judged to be the portions is automatically extracted as a human figure region.
  • the present invention has been conceived based on consideration of the above circumstances, and an object of the present invention is therefore to provide a method, an apparatus, and a program that automatically extract a human figure region in a general image with improved extraction performance.
  • a human figure region extraction method of the present invention is a method of extracting a human figure region in an image, and the method comprises the steps of:
  • the estimated region so as to include a near outer region that is located near the human figure region in the outline periphery region and outside the estimated region, in the case where at least a portion of the human figure region has been judged to exist;
  • a human figure region extraction apparatus of the present invention is an apparatus for extracting a human figure region in an image, and the apparatus comprises:
  • face detection means for detecting a face or facial part in the image
  • estimated region determination means for determining an estimated region which is estimated to include the human figure region, based on position information of the detected face or facial part;
  • human figure region extraction means for extracting the human figure region in the estimated region
  • judgment means for judging whether at least a portion of the extracted human figure region exists in an outline periphery region in the estimated region, wherein,
  • the estimated region determination means extends and updates the estimated region so as to include a near outer region that is located near the human figure region in the outline periphery region and outside the estimated region, and
  • the human figure region extraction means extracts the human figure region in the extended and updated estimated region.
  • the judgment means, the estimated region determination means, and the human figure region extraction means to repeatedly carry out the judgment on whether at least a portion of the human figure region exists in the outline periphery region, the extension and update of the estimated region, and the extraction of the human figure region in the extended and updated estimated region until the judgment means has judged that the human figure region does not exist in the outline periphery region.
  • the human figure region extraction means can calculate an evaluation value for each pixel in the estimated region from image data therein and from image data in an outside region located outside the estimated region, and can extract the human figure region based on the evaluation value.
  • the human figure region extraction means can extract the human figure region by using skin color information in the image.
  • a human figure region extraction program of the present invention is a program for extracting a human figure region in an image, and the program causes a computer to execute the procedures of:
  • the estimated region so as to include a near outer region that is located near the human figure region in the outline periphery region and outside the estimated region, in the case where at least a portion of the human figure region has been judged to exist;
  • the estimated region may be determined only from the position information of the face or facial part or from the position information as well as other information such as face size information for the case of face, for example.
  • the outline periphery region refers to a region of a predetermined range from an outline of the estimated region within the estimated region, and may refer to a region of the predetermined range including the outline, a region of the predetermined range excluding the outline, or only the outline.
  • the face or facial part is detected in the image, and the estimated region which is estimated to include the human figure region is determined from the position information of the detected face or facial part.
  • the human figure region is extracted in the estimated region, and judgment is made on whether at least a portion of the human figure region exists in the outline periphery region. In the case where a result of the judgment is affirmative, the estimated region is extended and updated so as to include the near outer region located near the human figure region existing in the outline periphery region and outside the estimated region.
  • the human figure region is then extracted in the extended and updated estimated region.
  • human figure regions that is, a torso is connected below a head and limbs are connected to the torso
  • the human figure region is extracted in the estimated region, and the estimated region is extended and updated based on the result of human figure region extraction in the estimated region, in order to extract the human figure region in the extended and updated estimated region. Therefore, correction of the estimated region as a range of human figure extraction can be carried out according to the diversity in the states of the human figure, which can take various poses or the like. Consequently, human figure region extraction processing can be carried out automatically and accurately in a general image.
  • the human figure region extraction method and apparatus of the present invention if processing of judgment on whether at least a portion of the human figure region exists in the outline periphery region and processing of estimated region extension and update and human figure region extraction in the extended and updated estimated region are repeated until the human figure region does not exist in the outline periphery region, the human figure region can be included in the estimated region extended and updated according to the result of human figure region extraction, even in the case where the human figure region has not been contained in the estimated region. In this manner, the whole human figure region can be extracted with certainty.
  • each pixel in the estimated region represents the human figure region or a background region, by using the image data of the estimated region largely including the human figure region and the image data of the outside region located outside the estimated region and including largely the background region.
  • FIG. 1 is a block diagram showing an embodiment of a human figure region extraction apparatus of the present invention
  • FIGS. 2A and 2B show how a face F is detected by face detection means in FIG. 1 ;
  • FIG. 3A is a graph showing R (Red) and G (Green) in a face region model G F while FIG. 3B is a graph showing R and G in a face background region model G c ;
  • FIGS. 4A and 4B show a method of cutting a region fb into a face region and a face background region
  • FIG. 5 shows an example of an estimated region determined by estimated region determination means in FIG. 1 ;
  • FIG. 6 shows a method of judgment processing and estimated region extension and update processing by judgment means and the estimated region determination means in FIG. 1 ;
  • FIGS. 7A to 7C show an example of a human figure region H extracted in an estimated region E extended and update by the estimated region determination means in FIG. 1 , and FIGS. 7A to 7C respectively show the estimated region E and the extracted human figure region H in initial processing, in processing for the second time, and in final processing;
  • FIG. 8 is a flow chart as an embodiment of a human figure region extraction method of the present invention.
  • FIG. 9 shows another method of judgment processing and estimated region extension and update processing by the judgment means and the estimated region determination means.
  • FIGS. 10A and 10B show how the estimated region is extended and updated by the estimated region determination means.
  • a human figure region extraction apparatus as an embodiment of the present invention shown in FIG. 1 is realized by execution of an image processing program read into an auxiliary storage apparatus on a computer (such as a personal computer).
  • the image processing program is stored in an information recording medium such as a CD-ROM or distributed via a network such as the Internet, and installed in the computer.
  • the human figure region extraction apparatus in this embodiment automatically extracts a human figure region H in a general image P, and comprises face detection means 10 , estimated region determination means 20 , human figure region extraction means 30 , and judgment means 40 .
  • the face detection means 10 detects a face F in the image P.
  • the estimated region determination means 20 determines an estimated region E which is estimated to include the human figure region H, based on position information and size information of the detected face F.
  • the human figure region extraction means 30 extracts the human figure region H in the determined estimated region E.
  • the judgment means 40 judges whether at least a portion of the human figure region H exists in an outline periphery region of the estimated region E.
  • the estimated region determination means 20 extends and updates the estimated region E so as to include a near outer region existing outside the estimated region E and near the human figure region H included in the outline periphery region.
  • the human figure region extraction means 30 then extracts the human figure region H in the extended and updated estimated region E (hereinafter, the extended and updated estimated region E will simply be referred to as the extended estimated region).
  • the face detection means 10 detects the face F in the image P, and detects a region representing a face as the face F.
  • the face detection means 10 firstly obtains detectors corresponding to characteristic quantities, and the detectors recognize a detection target such as a face or eyes by pre-learning the characteristic quantities of pixels in sample images wherein the detection target is known, that is, by pre-learning directions and magnitudes of changes in density of the pixels in the images, as has been described in Japanese Unexamined Patent Publication No. 2006-139369, for example.
  • the face detection means 10 then detects a face image by using this known technique, through scanning of the image with the detectors.
  • the face detection means 10 thereafter detects eye positions Er and El in the face image.
  • the face detection means 10 finds a distance D between the detected eye positions Er and El as shown in FIG. 2A , and determines a D ⁇ D square region fa such that the midpoint of the upper side of the region fa is positioned at the midpoint between the eye positions.
  • the face detection means also determines a rectangular region fb of 3.5 D (in the vertical direction) ⁇ 3.0 D (in the horizontal direction) such that the center of the rectangular region is positioned at the midpoint between the eye positions.
  • the face detection means 10 determines a region fd of a predetermined size that is sufficiently large to include the regions fa and fb.
  • the region outside the region fa in the region fd is a region fc. Since the region fa has been set to have the size that is sufficiently included in the face F, image data in the region fa mainly include image data of the face F while image data in the region fc mainly include image data of a background region.
  • a set of pixels in each of the regions fa and fc is then divided into 8 sets according to a color clustering method described in M. Orchard and C. Bouman, “Color Quantization of Images”, IEEE Transactions on Signal Processing, Vol. 39, No. 12, pp. 2677-2690, 1991.
  • the direction along which variation in colors (color vectors) is greatest is found in each of a plurality of clusters (the sets of pixels) C n , and the cluster C n is split into two clusters C 2n and C 2+1 by a plane that is perpendicular to the direction and passes a mean value (mean vector) of the colors of the cluster C n .
  • the whole set of pixels having various color spaces can be segmented into subsets of the same or similar colors.
  • a mean vector u rgb , a variance-covariance matrix ⁇ , and the like of a Gaussian distribution of R (Red), G (Green), and B (Blue) are calculated for each of the 8 sets in each of the regions fa and fc, and a GMM (Gaussian Mixture Model) model G is found in an RGB color space in each of the regions fa and fc according to the following equation (1).
  • the GMM model G found from the region fa that largely includes the image data of the face F is a face region model G F and the GMM model G found from the region fc that largely includes the image data of the background of the face F is a face background region model G C .
  • i, ⁇ , u, ⁇ , and d respectively refer to the number of mixture components of the Gaussian distributions (the number of the sets of pixels), mixture weights for the distributions, the mean vectors of the Gaussian distributions of RGB, the variance-covariance matrices of the Gaussian distributions, and the number of dimensions of a characteristic vector.
  • FIG. 3A is a graph showing R and G in the face region model G F while FIG. 3B is a graph showing R and G in the face background region model G C .
  • Each of the graphs comprises 8 elliptic Gaussian distributions, and the face region model G F has different probability density from the face background region model G C .
  • the region fb is then cut into a face region and a background region according to region segmentation methods described in Y. Boykov and M. Jolly, “Interactive Graph Cuts for Optimal Boundary & Region Segmentation of Objects in N-D images”, Proc. of Int. Conf. on Computer Vision, Vol. I, pp. 105-112, 2001 and C. Rother et al., “GrabCut-Interactive Foreground Extraction using Iterated Graph Cuts”, ACM Transactions on Graphics (SIGGRAPH' 04), 2004, based on the face region model G F and the face background region model G C .
  • a graph is generated as shown in FIG. 4A comprising nodes representing the respective pixels in the image, nodes S and T representing labels (either the face region or the face background region in this embodiment) for the respective pixels, n-links connecting the nodes of pixels neighboring each other, and t-links connecting the nodes of the respective pixels with the node S representing the face region and the node T representing the face background region.
  • Each of the n-links represents a likelihood (cost) of the neighboring pixels belonging to the same region by the thickness thereof, and the likelihood (cost) can be found from a distance between the neighboring pixels and a difference in the color vectors thereof.
  • the t-links represent likelihoods (cost) of each of the pixels belonging to the face region and to the face background region, and the likelihoods (cost) can be found for each of the pixels by calculating probabilities that the color vector thereof corresponds to probability density functions for the face region G F and the face background region G C .
  • the face region and the face background region are mutually exclusive, and the region fb is cut into the face region and the face background region as shown in FIG. 4B by cutting either one of the t-links connecting the node of each of the pixels to the node S or T representing the face region or the face background region and by cutting the n-links that connect the neighboring nodes having the different labels.
  • the region segmentation can be carried out optimally, and the face region can be detected efficiently.
  • the face region extracted in this manner is detected as the face F.
  • the estimated region determination means 20 determines the estimated region E which is estimated to include the human figure region, based on the position information and the size information of the face F detected by the face detection means 10 . As shown in FIG. 5 , the estimated region determination means 20 determines a rectangular region E 1 which is centered at a position Fc of the center of the face F and has a horizontal width and a vertical width being 1.5 times a maximum horizontal width (Dw in FIG. 2B ) and a maximum vertical width (Dh in FIG. 2B ) of the face F, respectively. The estimated region determination means 20 determines below the region E 1 a rectangular region E 2 whose horizontal width and vertical width are 3 times a maximum horizontal width and 7 times a maximum vertical width of the region E 1 . The estimated region determination means 20 then determines the regions E 1 and E 2 as the estimated region E (where the lower side of the region E 1 is in contact with the upper side of E 2 and the regions E 1 and E 2 are not disconnected).
  • the estimated region determination means 20 has a function of extending and updating the estimated region E.
  • the estimated region determination means 20 extends and updates the estimated region E so as to include a near outer region existing near the human figure region H in the outline periphery region and located outside the estimated region E.
  • the human figure region extraction means 30 calculates an evaluation value for each of the pixels in the estimated region E, based on image data in the estimated region E determined by the estimated region determination means 20 and image data of an outside region OR located outside the estimated region E.
  • the human figure region extraction means 30 extracts the human figure region H based on the evaluation value.
  • the evaluation value is a likelihood.
  • a set of pixels therein is divided into 8 sets by the color clustering method described above.
  • a mean vector u rgb , a variance-covariance matrix ⁇ , and the like of a Gaussian distribution of R, G, and B are calculated for each of the 8 sets in each of the regions E and B, and a GMM model G is found in an RGB color space in each of the regions E and B according to Equation (1).
  • the GMM model G found from the estimated region E that is estimated to include more of the human figure region is a human figure region model G H
  • the GMM model G found from the outside region OR that is located outside the estimated region E and includes more of a background region is a background region model GB.
  • the estimated region E is cut into the human figure region H and the background region BK by using the same region segmentation methods as the face detection means 10 .
  • an n-link representing a likelihood (cost) of every neighboring pixels belonging to the same region is found from a distance between the neighboring pixels and a difference in color vectors thereof.
  • a probability of the color vector of each of the pixels corresponding to a probability density function of the human figure region model G H or to a probability density function of the human figure region model G H a t-link representing a likelihood of each of the pixels belonging to the human figure region or the background region can be found.
  • the estimated region E is cut into the human figure region H and the background region BK according to the above-described region segmentation optimization method by cutting the links of minimal cost. In this manner, the human figure region H is extracted.
  • the human figure region extraction means 30 judges that each of the pixels in the estimated region E is a pixel representing a skin color region in the case where values (0 ⁇ 255) of R, G, and B thereof satisfy the following equation (2), and updates values of the t-links connecting the nodes of the pixels belonging to the skin color region to the node S representing the human figure region. Since the likelihood (cost) that the pixels in the skin color region are pixels representing the human figure region can be increased through this procedure, human figure region extraction performance can be improved by applying skin color information that is specific to human bodies to the extraction.
  • the judgment means 40 judges whether at least a portion of the human figure region H extracted by the human figure region extraction means 30 exists in the outline periphery region in the estimated region E. As shown in FIG. 6 , the judgment means 40 carries out this judgment by finding presence or absence of a region Q H wherein the extracted human figure region H overlaps an outline periphery region Q as a region of a predetermined range from an outline L of the estimated region E.
  • the estimated region determination means 20 sets as a near outer region E N a region existing outside the estimated region E in a region of a predetermined range from the region Q H having the overlap between the human figure region H and the outline periphery region Q, and extends and updates the estimated region E to include the near outer region E N .
  • the human figure region extraction means 30 extracts the human figure region H again in the extended estimated region E thereafter, and the judgment means 40 judges whether at least a portion of the human figure region H exists in the outline periphery region Q in the extended estimated region E.
  • FIGS. 7A to 7C show an example of repetitive extraction of the human figure region H while the estimated region E is extended and updated.
  • FIG. 7A shows the estimated region E determined initially based on the position information and the like of the face F, and the human figure region H extracted in the estimated region E.
  • FIG. 7B shows the region E estimated for the second time by extension and update thereof based on the initial human figure region extraction result shown in FIG. 7A , and the human figure region H extracted in the extended estimated region E.
  • FIG. 7C shows the ultimately determined estimated region E and the human figure region H extracted therein.
  • the face detection means 10 detects the face F in the image P (Step ST 1 ). Thereafter, the estimated region determination means 20 determines the estimated region E which is estimated to include the human figure region H according to the position information and the size information of the detected face F (Step ST 2 ).
  • the human figure region extraction means 30 extracts the human figure region H in the estimated region E having been determined (Step ST 3 ), and the judgment means 40 judges whether at least a portion of the human figure region H exists in the outline periphery region in the estimated region E (Step ST 4 ).
  • the estimated region E is extended and updated so as to include the near outer region located outside the estimated region E and near the human figure region H in the outline periphery region (Step ST 5 ).
  • the flow of processing then returns to Step ST 3 , and the human figure region H is extracted in the extended estimated region E.
  • the extraction of the human figure region H is completed when the human figure region H has been judged not to exist in the outline periphery region after repetition of the procedures from Step ST 3 to Step ST 5 .
  • the face F is detected in the image P, and the estimated region E which is estimated to include the human figure region H is determined based on the position information and the like of the detected face F.
  • the human figure region H is extracted in the estimated region E, and the judgment is made as to whether at least a portion of the human figure region H exists in the outline periphery region of the estimated region E.
  • the estimated region E is extended and updated so as to include the near outer region that is near the human figure region H in the outline periphery region and outside the estimated region E until the human figure region H has been judged not to exist in the outline periphery region.
  • the human figure region H is then extracted in the extended estimated region.
  • the human figure region can be included in the extended estimated region E based on the result of human figure region extraction even in the case where the human figure region H has not been contained in the estimated region E. In this manner, the extraction of the whole human figure region can be carried out automatically and with certainty in the general image.
  • the estimated region determination means 20 determines the estimated region E which is estimated to include the human figure region H, based on the position information and the size information of the face F detected by the face detection means 10 .
  • the face detection means 10 can detect anything by which the estimated region determination means 20 can identify a position of a head as a reference to determine a position of the estimated region which is estimated to include the human figure region. Therefore, the face detection means 10 may detect not only a position of a face but also a position of other facial parts such as eyes, a nose, or a mouth.
  • the detected face or facial part can be used to identify an approximate size of the head from a size of the face, a distance between the eyes, a size of the nose, a size of the mouth, or the like, the size of the estimated region can be determined more accurately.
  • a distance D may be found between the positions of the eyes detected by the face detection means 10 so that a rectangular region E 1 of 3D ⁇ 3D centered at the midpoint of the eyes can be determined.
  • a rectangular region E 2 whose horizontal width and vertical width are 3 times the horizontal width and 7 times the vertical width of the region E 1 is then determined below the region E 1 , and the regions E 1 and E 2 can be determined as the estimated region E (where the lower side of the region E 1 is in contact with the upper side of the region E 2 and the regions E 1 and E 2 are not disconnected).
  • the estimated region E can be a region of a preset shape and size determined from a position of the center of the face as a reference point.
  • the estimated region E may be a region that can sufficiently include the human figure region, and can be any region of any arbitrary shape, such as a rectangle, a circle, or an ellipse of any size.
  • the image data of the estimated region E and the image data of the outside region OR may be image data representing the entirety or a part of each region.
  • the human figure region extraction means 30 judges whether each of the pixels in the estimated region E represents the skin color region according to the condition represented by Equation (2) above. However, this judgment may be carried out based on skin color information that is specific to the human figure in the image P. For example, a GMM model G represented by Equation (1) above is found from a set of pixels judged to satisfy the condition of Equation (2) in a predetermined region such as in the image P, and used as a probability density function including the skin color information specific to the human figure in the image P. Based on the probability density function, whether each of the pixels in the estimated region E represents the skin color region can be judged again.
  • the judgment means 40 judges the presence or absence of the region Q H having an overlap between the outline periphery region Q and the human figure region H, and the estimated region determination means 20 extends and updates the estimated region E so as to include the near outer region E N located outside the estimated region E, out of the region of the predetermined range from the region Q H .
  • the estimated region E may be extended and updated through judgment of the presence or absence of at least a portion of the human figure region in the outline periphery region in the estimated region according to a method described below or according to another method.
  • a predetermined point on the outline L of the estimated region E is designated as a starting point L s and a target pixel L p sequentially denotes each of the pixels along the outline L in clockwise or counterclockwise direction.
  • Whether at least a portion of the human figure region H exists in the outline periphery region can be judged through judgment as to whether the human figure region H exists in a region Q p inside the estimated region E in a region of a predetermined range from the pixel L p .
  • a position of the target pixel L p is updated according to a method described below.
  • a straight line Sa passing pixels Lp m ⁇ 1 and Lp m+1 sandwiching a pixel Lp m whose position is to be updated along the outline L is found, and an outwardly normal line Sb passing the pixel Lp m from the line Sa is found.
  • the intersection of the lines Sa and Sb be denoted by O.
  • the position of the pixel Lp m is updated to a point Lp m ′ on the normal line Sb at a predetermined distance ⁇ (where ⁇ is an increment to grow the outline once) from the point O.
  • the outline L of the estimated region E is updated to pass the point Lp m ′, and the estimated region E is updated as a region surrounded by the updated outline L. In this manner, the estimated region E can be extended and updated.
  • the extension and update of the estimated region E and the human figure region extraction in the extended estimated region and the like are carried out in the case where the judgment means 40 has judged that at least a portion of the human figure region exists in the outline periphery region of the estimated region E.
  • the extension and update of the estimated region and the extraction of the human figure region therein may be carried out in the case where the number of positions at which the human figure region exists in the outline periphery region in the estimated region is equal to or larger than a predetermined number.
  • the extension and update of the estimated region and the extraction of the human figure region therein are repeated until the human figure region has been judged not to exist in the outline periphery region.
  • a maximum number of the repetitions may be set in advance so that the extraction of the human figure region can be completed within a predetermined number of repetitions that is preset to be equal to or larger than 1.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • General Health & Medical Sciences (AREA)
  • Geometry (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

Upon extraction of a human figure region in an image, a face or facial part is detected in the image, and an estimated region which is estimated to include the human figure region is determined from position information of the detected face or facial part. The human figure region is extracted in the estimated region. Judgment is made as to whether at least a portion of the human figure region exists in an outline periphery region of the estimated region, and the estimated region is extended and updated so as to include a near outer region near the human figure region in the outline periphery region and outside the estimated region, in the case where a result of the judgment is affirmative. The human figure region is extracted in the extended and updated estimated region.

Description

    BACKGROUND OF THE INVENTION
  • 1. Field of the Invention
  • The present invention relates to a method and an apparatus for extracting a human figure region in an image. The present invention also relates to a program that causes a computer to execute the method.
  • 2. Description of the Related Art
  • For image editing such as image classification, automatic trimming, and electronic photo album generation, extraction of human figure regions and recognition of poses in images are expected. As a method of extraction of human figure regions by separation from backgrounds in images, a method described in Japanese Unexamined Patent Publication No. 2005-339363 has been known, for example. In this method, a person is photographed with a predetermined specific background, and a human figure region is cut out from the background based on the difference in colors therebetween.
  • In addition to the method using a predetermined background setting as has been described above, a method of separating a human figure region from any arbitrary background in an image by advance manual input of information on a portion of the human figure region and the background has been proposed in Y. Boykov and M. Jolly, “Interactive Graph Cuts for Optimal Boundary & Region Segmentation of Objects in N-D Images”, Proc. of Int. Conf. on Computer Vision, Vol. I, pp. 105-112, 2001. This method, which adopts advance specification of a portion of human figure region and background, has been used mainly for interactive cutting.
  • Furthermore, an automatic human figure region extraction method has been proposed in G. Mori et al., “Recovering Human Body Configurations: Combining Segmentation and Recognition”, CVPR, pp. 1-8, 2004. In this method, a whole image is subjected to region segmentation processing and judgment is made on each region as to whether the region is a portion of a human figure region based on characteristics such as the shape, the color, and texture thereof. An assembly of the regions having been judged to be the portions is automatically extracted as a human figure region.
  • However, in this method of human figure region extraction using the characteristics of respective regions generated through segmentation, human figure regions cannot be extracted correctly in the case where a degree of segmentation is not appropriate for human figure extraction such as cases where regions generated through segmentation are too small for accurate judgment of portions of human figure regions, or too large and include background regions as well. Therefore, the accuracy of human figure region extraction is strongly affected by the degree of segmentation in this method.
  • SUMMARY OF THE INVENTION
  • The present invention has been conceived based on consideration of the above circumstances, and an object of the present invention is therefore to provide a method, an apparatus, and a program that automatically extract a human figure region in a general image with improved extraction performance.
  • A human figure region extraction method of the present invention is a method of extracting a human figure region in an image, and the method comprises the steps of:
  • detecting a face or facial part in the image;
  • determining an estimated region which is estimated to include the human figure region, based on position information of the detected face or facial part;
  • extracting the human figure region in the estimated region;
  • judging whether at least a portion of the extracted human figure region exists in an outline periphery region in the estimated region;
  • extending and updating the estimated region so as to include a near outer region that is located near the human figure region in the outline periphery region and outside the estimated region, in the case where at least a portion of the human figure region has been judged to exist; and
  • extracting the human figure region in the extended and updated estimated region.
  • In the method described above, it is preferable for the steps of judging, extending and updating, and extracting in the extended and updated estimated region to be repeated until the human figure region has been judged not to exist in the outline periphery region.
  • A human figure region extraction apparatus of the present invention is an apparatus for extracting a human figure region in an image, and the apparatus comprises:
  • face detection means for detecting a face or facial part in the image;
  • estimated region determination means for determining an estimated region which is estimated to include the human figure region, based on position information of the detected face or facial part;
  • human figure region extraction means for extracting the human figure region in the estimated region; and
  • judgment means for judging whether at least a portion of the extracted human figure region exists in an outline periphery region in the estimated region, wherein,
  • in the case where the judgment means has judged that at least a portion of the human figure region exists, the estimated region determination means extends and updates the estimated region so as to include a near outer region that is located near the human figure region in the outline periphery region and outside the estimated region, and
  • the human figure region extraction means extracts the human figure region in the extended and updated estimated region.
  • In the human figure region extraction apparatus, it is preferable for the judgment means, the estimated region determination means, and the human figure region extraction means to repeatedly carry out the judgment on whether at least a portion of the human figure region exists in the outline periphery region, the extension and update of the estimated region, and the extraction of the human figure region in the extended and updated estimated region until the judgment means has judged that the human figure region does not exist in the outline periphery region.
  • The human figure region extraction means can calculate an evaluation value for each pixel in the estimated region from image data therein and from image data in an outside region located outside the estimated region, and can extract the human figure region based on the evaluation value.
  • In addition, the human figure region extraction means can extract the human figure region by using skin color information in the image.
  • A human figure region extraction program of the present invention is a program for extracting a human figure region in an image, and the program causes a computer to execute the procedures of:
  • detecting a face or facial part in the image;
  • determining an estimated region which is estimated to include the human figure region, based on position information of the detected face or facial part;
  • extracting the human figure region in the estimated region;
  • judging whether at least a portion of the extracted human figure region exists in an outline periphery region in the estimated region;
  • extending and updating the estimated region so as to include a near outer region that is located near the human figure region in the outline periphery region and outside the estimated region, in the case where at least a portion of the human figure region has been judged to exist; and
  • extracting the human figure region in the extended and updated estimated region.
  • The estimated region may be determined only from the position information of the face or facial part or from the position information as well as other information such as face size information for the case of face, for example.
  • The outline periphery region refers to a region of a predetermined range from an outline of the estimated region within the estimated region, and may refer to a region of the predetermined range including the outline, a region of the predetermined range excluding the outline, or only the outline.
  • According to the human figure region extraction method and apparatus of the present invention, the face or facial part is detected in the image, and the estimated region which is estimated to include the human figure region is determined from the position information of the detected face or facial part. The human figure region is extracted in the estimated region, and judgment is made on whether at least a portion of the human figure region exists in the outline periphery region. In the case where a result of the judgment is affirmative, the estimated region is extended and updated so as to include the near outer region located near the human figure region existing in the outline periphery region and outside the estimated region. The human figure region is then extracted in the extended and updated estimated region. In this manner, attention is paid to the characteristics of human figure regions (that is, a torso is connected below a head and limbs are connected to the torso) to determine the estimated region that is to include the human figure region, with reference to a head based on the position information or the like of the head identified by the face or facial part. The human figure region is extracted in the estimated region, and the estimated region is extended and updated based on the result of human figure region extraction in the estimated region, in order to extract the human figure region in the extended and updated estimated region. Therefore, correction of the estimated region as a range of human figure extraction can be carried out according to the diversity in the states of the human figure, which can take various poses or the like. Consequently, human figure region extraction processing can be carried out automatically and accurately in a general image.
  • In the human figure region extraction method and apparatus of the present invention, if processing of judgment on whether at least a portion of the human figure region exists in the outline periphery region and processing of estimated region extension and update and human figure region extraction in the extended and updated estimated region are repeated until the human figure region does not exist in the outline periphery region, the human figure region can be included in the estimated region extended and updated according to the result of human figure region extraction, even in the case where the human figure region has not been contained in the estimated region. In this manner, the whole human figure region can be extracted with certainty.
  • In the case where the human figure region extraction is carried out based on the evaluation value calculated for each pixel in the estimated region based on the image data therein and in the outside region located outside the estimated region, judgment can be appropriately made as to whether each pixel in the estimated region represents the human figure region or a background region, by using the image data of the estimated region largely including the human figure region and the image data of the outside region located outside the estimated region and including largely the background region.
  • In addition, in the case where the human figure region extraction is carried out by use of the skin color information in the image, accuracy of the human figure region extraction can be improved.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1 is a block diagram showing an embodiment of a human figure region extraction apparatus of the present invention;
  • FIGS. 2A and 2B show how a face F is detected by face detection means in FIG. 1;
  • FIG. 3A is a graph showing R (Red) and G (Green) in a face region model GF while FIG. 3B is a graph showing R and G in a face background region model Gc;
  • FIGS. 4A and 4B show a method of cutting a region fb into a face region and a face background region;
  • FIG. 5 shows an example of an estimated region determined by estimated region determination means in FIG. 1;
  • FIG. 6 shows a method of judgment processing and estimated region extension and update processing by judgment means and the estimated region determination means in FIG. 1;
  • FIGS. 7A to 7C show an example of a human figure region H extracted in an estimated region E extended and update by the estimated region determination means in FIG. 1, and FIGS. 7A to 7C respectively show the estimated region E and the extracted human figure region H in initial processing, in processing for the second time, and in final processing;
  • FIG. 8 is a flow chart as an embodiment of a human figure region extraction method of the present invention;
  • FIG. 9 shows another method of judgment processing and estimated region extension and update processing by the judgment means and the estimated region determination means; and
  • FIGS. 10A and 10B show how the estimated region is extended and updated by the estimated region determination means.
  • DESCRIPTION OF THE PREFERRED EMBODIMENT
  • Hereinafter, an embodiment of a human figure region extraction apparatus of the present invention will be described with reference to the accompanying drawings. A human figure region extraction apparatus as an embodiment of the present invention shown in FIG. 1 is realized by execution of an image processing program read into an auxiliary storage apparatus on a computer (such as a personal computer). The image processing program is stored in an information recording medium such as a CD-ROM or distributed via a network such as the Internet, and installed in the computer.
  • The human figure region extraction apparatus in this embodiment automatically extracts a human figure region H in a general image P, and comprises face detection means 10, estimated region determination means 20, human figure region extraction means 30, and judgment means 40. The face detection means 10 detects a face F in the image P. The estimated region determination means 20 determines an estimated region E which is estimated to include the human figure region H, based on position information and size information of the detected face F. The human figure region extraction means 30 extracts the human figure region H in the determined estimated region E. The judgment means 40 judges whether at least a portion of the human figure region H exists in an outline periphery region of the estimated region E.
  • In the case where the judgment means 40 has judged that at least a portion of the human figure region H exists in the outline periphery region of the estimated region E, the estimated region determination means 20 extends and updates the estimated region E so as to include a near outer region existing outside the estimated region E and near the human figure region H included in the outline periphery region. The human figure region extraction means 30 then extracts the human figure region H in the extended and updated estimated region E (hereinafter, the extended and updated estimated region E will simply be referred to as the extended estimated region).
  • The face detection means 10 detects the face F in the image P, and detects a region representing a face as the face F. The face detection means 10 firstly obtains detectors corresponding to characteristic quantities, and the detectors recognize a detection target such as a face or eyes by pre-learning the characteristic quantities of pixels in sample images wherein the detection target is known, that is, by pre-learning directions and magnitudes of changes in density of the pixels in the images, as has been described in Japanese Unexamined Patent Publication No. 2006-139369, for example. The face detection means 10 then detects a face image by using this known technique, through scanning of the image with the detectors. The face detection means 10 thereafter detects eye positions Er and El in the face image.
  • The face detection means 10 finds a distance D between the detected eye positions Er and El as shown in FIG. 2A, and determines a D×D square region fa such that the midpoint of the upper side of the region fa is positioned at the midpoint between the eye positions. The face detection means also determines a rectangular region fb of 3.5 D (in the vertical direction)×3.0 D (in the horizontal direction) such that the center of the rectangular region is positioned at the midpoint between the eye positions. Thereafter, the face detection means 10 determines a region fd of a predetermined size that is sufficiently large to include the regions fa and fb. The region outside the region fa in the region fd is a region fc. Since the region fa has been set to have the size that is sufficiently included in the face F, image data in the region fa mainly include image data of the face F while image data in the region fc mainly include image data of a background region.
  • A set of pixels in each of the regions fa and fc is then divided into 8 sets according to a color clustering method described in M. Orchard and C. Bouman, “Color Quantization of Images”, IEEE Transactions on Signal Processing, Vol. 39, No. 12, pp. 2677-2690, 1991.
  • In the color clustering method, the direction along which variation in colors (color vectors) is greatest is found in each of a plurality of clusters (the sets of pixels) Cn, and the cluster Cn is split into two clusters C2n and C2+1 by a plane that is perpendicular to the direction and passes a mean value (mean vector) of the colors of the cluster Cn. According to this method, the whole set of pixels having various color spaces can be segmented into subsets of the same or similar colors.
  • A mean vector urgb, a variance-covariance matrix Σ, and the like of a Gaussian distribution of R (Red), G (Green), and B (Blue) are calculated for each of the 8 sets in each of the regions fa and fc, and a GMM (Gaussian Mixture Model) model G is found in an RGB color space in each of the regions fa and fc according to the following equation (1). The GMM model G found from the region fa that largely includes the image data of the face F is a face region model GF and the GMM model G found from the region fc that largely includes the image data of the background of the face F is a face background region model GC.
  • G = i = 1 8 λ i 1 ( 2 π ) d / 2 1 / 2 exp [ - 1 2 ( x - u i ) t - 1 ( x - u ) ] ( 1 )
  • In Equation (1), i, λ, u, Σ, and d respectively refer to the number of mixture components of the Gaussian distributions (the number of the sets of pixels), mixture weights for the distributions, the mean vectors of the Gaussian distributions of RGB, the variance-covariance matrices of the Gaussian distributions, and the number of dimensions of a characteristic vector.
  • FIG. 3A is a graph showing R and G in the face region model GF while FIG. 3B is a graph showing R and G in the face background region model GC. Each of the graphs comprises 8 elliptic Gaussian distributions, and the face region model GF has different probability density from the face background region model GC.
  • The region fb is then cut into a face region and a background region according to region segmentation methods described in Y. Boykov and M. Jolly, “Interactive Graph Cuts for Optimal Boundary & Region Segmentation of Objects in N-D images”, Proc. of Int. Conf. on Computer Vision, Vol. I, pp. 105-112, 2001 and C. Rother et al., “GrabCut-Interactive Foreground Extraction using Iterated Graph Cuts”, ACM Transactions on Graphics (SIGGRAPH' 04), 2004, based on the face region model GF and the face background region model GC.
  • In the region segmentation methods described above, a graph is generated as shown in FIG. 4A comprising nodes representing the respective pixels in the image, nodes S and T representing labels (either the face region or the face background region in this embodiment) for the respective pixels, n-links connecting the nodes of pixels neighboring each other, and t-links connecting the nodes of the respective pixels with the node S representing the face region and the node T representing the face background region. Each of the n-links represents a likelihood (cost) of the neighboring pixels belonging to the same region by the thickness thereof, and the likelihood (cost) can be found from a distance between the neighboring pixels and a difference in the color vectors thereof. The t-links represent likelihoods (cost) of each of the pixels belonging to the face region and to the face background region, and the likelihoods (cost) can be found for each of the pixels by calculating probabilities that the color vector thereof corresponds to probability density functions for the face region GF and the face background region GC.
  • The face region and the face background region are mutually exclusive, and the region fb is cut into the face region and the face background region as shown in FIG. 4B by cutting either one of the t-links connecting the node of each of the pixels to the node S or T representing the face region or the face background region and by cutting the n-links that connect the neighboring nodes having the different labels. By causing a total of the cost for cutting the t-links and the n-links to become minimal, the region segmentation can be carried out optimally, and the face region can be detected efficiently. The face region extracted in this manner is detected as the face F.
  • The estimated region determination means 20 determines the estimated region E which is estimated to include the human figure region, based on the position information and the size information of the face F detected by the face detection means 10. As shown in FIG. 5, the estimated region determination means 20 determines a rectangular region E1 which is centered at a position Fc of the center of the face F and has a horizontal width and a vertical width being 1.5 times a maximum horizontal width (Dw in FIG. 2B) and a maximum vertical width (Dh in FIG. 2B) of the face F, respectively. The estimated region determination means 20 determines below the region E1 a rectangular region E2 whose horizontal width and vertical width are 3 times a maximum horizontal width and 7 times a maximum vertical width of the region E1. The estimated region determination means 20 then determines the regions E1 and E2 as the estimated region E (where the lower side of the region E1 is in contact with the upper side of E2 and the regions E1 and E2 are not disconnected).
  • The estimated region determination means 20 has a function of extending and updating the estimated region E. In the case where the judgment means 40 that will be described later has judged that at least a portion of the human figure region H exists in the outline periphery region in the estimated region E, the estimated region determination means 20 extends and updates the estimated region E so as to include a near outer region existing near the human figure region H in the outline periphery region and located outside the estimated region E.
  • The human figure region extraction means 30 calculates an evaluation value for each of the pixels in the estimated region E, based on image data in the estimated region E determined by the estimated region determination means 20 and image data of an outside region OR located outside the estimated region E. The human figure region extraction means 30 extracts the human figure region H based on the evaluation value. In this embodiment, the evaluation value is a likelihood.
  • In the estimated region E and in the outside region OR located outside the estimated region E, a set of pixels therein is divided into 8 sets by the color clustering method described above. A mean vector urgb, a variance-covariance matrix Σ, and the like of a Gaussian distribution of R, G, and B are calculated for each of the 8 sets in each of the regions E and B, and a GMM model G is found in an RGB color space in each of the regions E and B according to Equation (1). The GMM model G found from the estimated region E that is estimated to include more of the human figure region is a human figure region model GH, and the GMM model G found from the outside region OR that is located outside the estimated region E and includes more of a background region is a background region model GB.
  • The estimated region E is cut into the human figure region H and the background region BK by using the same region segmentation methods as the face detection means 10. Firstly, an n-link representing a likelihood (cost) of every neighboring pixels belonging to the same region is found from a distance between the neighboring pixels and a difference in color vectors thereof. By calculating a probability of the color vector of each of the pixels corresponding to a probability density function of the human figure region model GH or to a probability density function of the human figure region model GH, a t-link representing a likelihood of each of the pixels belonging to the human figure region or the background region can be found. Thereafter, the estimated region E is cut into the human figure region H and the background region BK according to the above-described region segmentation optimization method by cutting the links of minimal cost. In this manner, the human figure region H is extracted.
  • Furthermore, the human figure region extraction means 30 judges that each of the pixels in the estimated region E is a pixel representing a skin color region in the case where values (0˜255) of R, G, and B thereof satisfy the following equation (2), and updates values of the t-links connecting the nodes of the pixels belonging to the skin color region to the node S representing the human figure region. Since the likelihood (cost) that the pixels in the skin color region are pixels representing the human figure region can be increased through this procedure, human figure region extraction performance can be improved by applying skin color information that is specific to human bodies to the extraction.

  • R>95 and G>40 and B>20 and max {R,G,B}−min {R,G,B}>15 and |R−G|>15 and R>G and R>B  (2)
  • The judgment means 40 judges whether at least a portion of the human figure region H extracted by the human figure region extraction means 30 exists in the outline periphery region in the estimated region E. As shown in FIG. 6, the judgment means 40 carries out this judgment by finding presence or absence of a region QH wherein the extracted human figure region H overlaps an outline periphery region Q as a region of a predetermined range from an outline L of the estimated region E.
  • In the case where the judgment means 40 has judged that the human figure region H does not exist in the outline periphery region Q, human figure region extraction has been completed. However, in the case where at least a portion of the human figure region H has been judged to exist in the outline periphery region Q, the estimated region determination means 20 sets as a near outer region EN a region existing outside the estimated region E in a region of a predetermined range from the region QH having the overlap between the human figure region H and the outline periphery region Q, and extends and updates the estimated region E to include the near outer region EN. The human figure region extraction means 30 extracts the human figure region H again in the extended estimated region E thereafter, and the judgment means 40 judges whether at least a portion of the human figure region H exists in the outline periphery region Q in the extended estimated region E.
  • The procedures described above, that is, the extension and update of the estimated region E by the estimated region determination means 20, the extraction of the human figure region H in the extended estimated region E by the human figure region extraction means 30, and the judgment of presence or absence of at least a portion of the human figure region H in the outline periphery region Q by the judgment means 40, are carried out until the judgment means 40 has judged that the human figure region H does not exist in the outline periphery region Q.
  • FIGS. 7A to 7C show an example of repetitive extraction of the human figure region H while the estimated region E is extended and updated. FIG. 7A shows the estimated region E determined initially based on the position information and the like of the face F, and the human figure region H extracted in the estimated region E. FIG. 7B shows the region E estimated for the second time by extension and update thereof based on the initial human figure region extraction result shown in FIG. 7A, and the human figure region H extracted in the extended estimated region E. FIG. 7C shows the ultimately determined estimated region E and the human figure region H extracted therein.
  • An embodiment of the human figure region extraction method of the present invention will be described next with reference to a flow chart in FIG. 8. The face detection means 10 detects the face F in the image P (Step ST1). Thereafter, the estimated region determination means 20 determines the estimated region E which is estimated to include the human figure region H according to the position information and the size information of the detected face F (Step ST2). The human figure region extraction means 30 extracts the human figure region H in the estimated region E having been determined (Step ST3), and the judgment means 40 judges whether at least a portion of the human figure region H exists in the outline periphery region in the estimated region E (Step ST4). In the case where a result of the judgment is affirmative, the estimated region E is extended and updated so as to include the near outer region located outside the estimated region E and near the human figure region H in the outline periphery region (Step ST5). The flow of processing then returns to Step ST3, and the human figure region H is extracted in the extended estimated region E. The extraction of the human figure region H is completed when the human figure region H has been judged not to exist in the outline periphery region after repetition of the procedures from Step ST3 to Step ST5.
  • According to this embodiment, the face F is detected in the image P, and the estimated region E which is estimated to include the human figure region H is determined based on the position information and the like of the detected face F. The human figure region H is extracted in the estimated region E, and the judgment is made as to whether at least a portion of the human figure region H exists in the outline periphery region of the estimated region E. The estimated region E is extended and updated so as to include the near outer region that is near the human figure region H in the outline periphery region and outside the estimated region E until the human figure region H has been judged not to exist in the outline periphery region. The human figure region H is then extracted in the extended estimated region. By repeating these procedures, the human figure region can be included in the extended estimated region E based on the result of human figure region extraction even in the case where the human figure region H has not been contained in the estimated region E. In this manner, the extraction of the whole human figure region can be carried out automatically and with certainty in the general image.
  • The present invention is not necessarily limited to the embodiment described above. For example, in the embodiment described above, the estimated region determination means 20 determines the estimated region E which is estimated to include the human figure region H, based on the position information and the size information of the face F detected by the face detection means 10. However, the face detection means 10 can detect anything by which the estimated region determination means 20 can identify a position of a head as a reference to determine a position of the estimated region which is estimated to include the human figure region. Therefore, the face detection means 10 may detect not only a position of a face but also a position of other facial parts such as eyes, a nose, or a mouth. Furthermore, if the detected face or facial part can be used to identify an approximate size of the head from a size of the face, a distance between the eyes, a size of the nose, a size of the mouth, or the like, the size of the estimated region can be determined more accurately.
  • For example, a distance D may be found between the positions of the eyes detected by the face detection means 10 so that a rectangular region E1 of 3D×3D centered at the midpoint of the eyes can be determined. A rectangular region E2 whose horizontal width and vertical width are 3 times the horizontal width and 7 times the vertical width of the region E1 is then determined below the region E1, and the regions E1 and E2 can be determined as the estimated region E (where the lower side of the region E1 is in contact with the upper side of the region E2 and the regions E1 and E2 are not disconnected). In the case where the estimated region E is determined only from the position of the face detected by the face detection means 10, the estimated region E can be a region of a preset shape and size determined from a position of the center of the face as a reference point.
  • The estimated region E may be a region that can sufficiently include the human figure region, and can be any region of any arbitrary shape, such as a rectangle, a circle, or an ellipse of any size.
  • When the human figure region H is extracted by the human figure region extraction means 30 through calculation of the evaluation value for each of the pixels in the estimated region E based on the image data of the estimated region E and based on the image data of the outside region OR located outside the estimated region E, the image data of the estimated region E and the image data of the outside region OR may be image data representing the entirety or a part of each region.
  • The human figure region extraction means 30 judges whether each of the pixels in the estimated region E represents the skin color region according to the condition represented by Equation (2) above. However, this judgment may be carried out based on skin color information that is specific to the human figure in the image P. For example, a GMM model G represented by Equation (1) above is found from a set of pixels judged to satisfy the condition of Equation (2) in a predetermined region such as in the image P, and used as a probability density function including the skin color information specific to the human figure in the image P. Based on the probability density function, whether each of the pixels in the estimated region E represents the skin color region can be judged again.
  • In the above embodiment, the judgment means 40 judges the presence or absence of the region QH having an overlap between the outline periphery region Q and the human figure region H, and the estimated region determination means 20 extends and updates the estimated region E so as to include the near outer region EN located outside the estimated region E, out of the region of the predetermined range from the region QH. However, the estimated region E may be extended and updated through judgment of the presence or absence of at least a portion of the human figure region in the outline periphery region in the estimated region according to a method described below or according to another method.
  • More specifically, as shown in FIG. 9, a predetermined point on the outline L of the estimated region E is designated as a starting point Ls and a target pixel Lp sequentially denotes each of the pixels along the outline L in clockwise or counterclockwise direction. Whether at least a portion of the human figure region H exists in the outline periphery region can be judged through judgment as to whether the human figure region H exists in a region Qp inside the estimated region E in a region of a predetermined range from the pixel Lp. In the case where presence of at least a portion of the human figure region has been found, a position of the target pixel Lp is updated according to a method described below.
  • Firstly, as shown in FIG. 10A, a straight line Sa passing pixels Lpm−1 and Lpm+1 sandwiching a pixel Lpm whose position is to be updated along the outline L is found, and an outwardly normal line Sb passing the pixel Lpm from the line Sa is found. Let the intersection of the lines Sa and Sb be denoted by O. The position of the pixel Lpm is updated to a point Lpm′ on the normal line Sb at a predetermined distance λ (where λ is an increment to grow the outline once) from the point O. Thereafter, as shown in FIG. 10B, the outline L of the estimated region E is updated to pass the point Lpm′, and the estimated region E is updated as a region surrounded by the updated outline L. In this manner, the estimated region E can be extended and updated.
  • In the above embodiment, the extension and update of the estimated region E and the human figure region extraction in the extended estimated region and the like are carried out in the case where the judgment means 40 has judged that at least a portion of the human figure region exists in the outline periphery region of the estimated region E. However, the extension and update of the estimated region and the extraction of the human figure region therein may be carried out in the case where the number of positions at which the human figure region exists in the outline periphery region in the estimated region is equal to or larger than a predetermined number.
  • In the above embodiment, the extension and update of the estimated region and the extraction of the human figure region therein are repeated until the human figure region has been judged not to exist in the outline periphery region. However, a maximum number of the repetitions may be set in advance so that the extraction of the human figure region can be completed within a predetermined number of repetitions that is preset to be equal to or larger than 1.

Claims (9)

1. A human figure region extraction method for extracting a human figure region in an image, the method comprising the steps of:
detecting a face or facial part in the image;
determining an estimated region which is estimated to include the human figure region, based on position information of the detected face or facial part;
extracting the human figure region in the estimated region;
judging whether at least a portion of the extracted human figure region exists in an outline periphery region in the estimated region;
extending and updating the estimated region so as to include a near outer region located near the human figure region in the outline periphery region and outside the estimated region, in the case where at least a portion of the human figure region has been judged to exist; and
extracting the human figure region in the extended and updated estimated region.
2. The human figure region extraction method according to claim 1, wherein the step of judging whether at least a portion of the human figure region exists in the outline periphery region, the step of extending and updating the estimated region, and the step of extracting the human figure region in the extended and updated estimated region are repeated until the human figure region has been judged not to exist in the outline periphery region.
3. A human figure region extraction apparatus for extracting a human figure region in an image, the apparatus comprising:
face detection means for detecting a face or facial part in the image;
estimated region determination means for determining an estimated region which is estimated to include the human figure region, based on position information of the detected face or facial part;
human figure region extraction means for extracting the human figure region in the estimated region; and
judgment means for judging whether at least a portion of the extracted human figure region exists in an outline periphery region in the estimated region, wherein,
in the case where the judgment means has judged that at least a portion of the human figure region exists, the estimated region determination means extends and updates the estimated region so as to include a near outer region located near the human figure region in the outline periphery region and outside the estimated region and
the human figure region extraction means extracts the human figure region in the extended and updated estimated region.
4. The human figure region extraction apparatus according to claim 3, wherein the judgment means, the estimated region determination means, and the human figure region extraction means repeatedly judges whether at least a portion of the human figure region exists in the outline periphery region, extends and updates the estimated region, and extracts the human figure region in the extended and updated estimated region until the judgment means has judged that the human figure region does not exist in the outline periphery region.
5. The human figure region extraction apparatus according to claim 3, wherein the human figure region extraction means calculates an evaluation value for each pixel in the estimated region from image data therein and from image data in an outside region located outside the estimated region, and extracts the human figure region based on the evaluation value.
6. The human figure region extraction apparatus according to claim 4, wherein the human figure region extraction means calculates an evaluation value for each pixel in the estimated region from image data therein and from image data in an outside region located outside the estimated region, and extracts the human figure region based on the evaluation value.
7. The human figure region extraction apparatus according to claim 5, wherein the human figure region extraction means extracts the human figure region by using skin color information in the image.
8. The human figure region extraction apparatus according to claim 6, wherein the human figure region extraction means extracts the human figure region by using skin color information in the image.
9. A computer-readable recording medium storing a program for extracting a human figure region in an image, the program causing a computer to execute the procedures of:
detecting a face or facial part in the image;
determining an estimated region which is estimated to include the human figure region, based on position information of the detected face or facial part;
extracting the human figure region in the estimated region;
judging whether at least a portion of the extracted human figure region exists in an outline periphery region in the estimated region;
extending and updating the estimated region so as to include a near outer region located near the human figure region in the outline periphery region and outside the estimated region, in the case where at least a portion of the human figure region has been judged to exist; and
extracting the human figure region in the extended and updated estimated region.
US11/819,465 2006-06-28 2007-06-27 Method, apparatus, and program for human figure region extraction Active 2030-08-17 US8041081B2 (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
JP177454/2006 2006-06-28
JP2006-177454 2006-06-28
JP2006177454A JP4699298B2 (en) 2006-06-28 2006-06-28 Human body region extraction method, apparatus, and program

Publications (2)

Publication Number Publication Date
US20080002890A1 true US20080002890A1 (en) 2008-01-03
US8041081B2 US8041081B2 (en) 2011-10-18

Family

ID=38876711

Family Applications (1)

Application Number Title Priority Date Filing Date
US11/819,465 Active 2030-08-17 US8041081B2 (en) 2006-06-28 2007-06-27 Method, apparatus, and program for human figure region extraction

Country Status (2)

Country Link
US (1) US8041081B2 (en)
JP (1) JP4699298B2 (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100061658A1 (en) * 2008-09-08 2010-03-11 Hideshi Yamada Image processing apparatus, method, and program
US20110075926A1 (en) * 2009-09-30 2011-03-31 Robinson Piramuthu Systems and methods for refinement of segmentation using spray-paint markup
US20120093414A1 (en) * 2009-11-04 2012-04-19 Olaworks, Inc. Method, terminal device, and computer-readable recording medium for setting an initial value for a graph cut
US8611724B2 (en) 2010-06-28 2013-12-17 Brother Kogyo Kabushiki Kaisha Computer readable medium, information processing apparatus and method for processing moving image and sound
US20160196662A1 (en) * 2013-08-16 2016-07-07 Beijing Jingdong Shangke Information Technology Co., Ltd. Method and device for manufacturing virtual fitting model image
US20170157788A1 (en) * 2015-12-04 2017-06-08 Urschel Laboratories, Inc. Stripper plates, dicing machines that utilize stripper plates, and methods of use
CN109033945A (en) * 2018-06-07 2018-12-18 西安理工大学 A kind of human body contour outline extracting method based on deep learning
CN114387335A (en) * 2021-11-30 2022-04-22 国网湖南省电力有限公司 Human body position estimation method based on monocular vision
US20220253549A1 (en) * 2021-02-08 2022-08-11 Capital One Services, Llc Methods and systems for automatically preserving a user session on a public access shared computer
CN117252903A (en) * 2023-11-10 2023-12-19 山东通广电子股份有限公司 Motion area extraction method and system based on image processing
CN120075634A (en) * 2023-11-22 2025-05-30 荣耀终端股份有限公司 Image blurring processing method and device

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009078957A1 (en) * 2007-12-14 2009-06-25 Flashfoto, Inc. Systems and methods for rule-based segmentation for objects with full or partial frontal view in color images
US8385609B2 (en) * 2008-10-21 2013-02-26 Flashfoto, Inc. Image segmentation
US9311567B2 (en) 2010-05-10 2016-04-12 Kuang-chih Lee Manifold learning and matting
JP2013074569A (en) * 2011-09-29 2013-04-22 Sanyo Electric Co Ltd Image processing device
JP7124281B2 (en) * 2017-09-21 2022-08-24 株式会社リコー Program, information processing device, image processing system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20010005219A1 (en) * 1999-12-27 2001-06-28 Hideaki Matsuo Human tracking device, human tracking method and recording medium recording program thereof
US20060170769A1 (en) * 2005-01-31 2006-08-03 Jianpeng Zhou Human and object recognition in digital video
US7379591B2 (en) * 1999-07-29 2008-05-27 Fujiflim Corporation Method and device for extracting specified image subject

Family Cites Families (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100414432B1 (en) * 1995-03-24 2004-03-18 마츠시타 덴끼 산교 가부시키가이샤 Contour extraction device
JP3490559B2 (en) * 1995-11-14 2004-01-26 富士写真フイルム株式会社 Method for determining main part of image and method for determining copy conditions
US5933529A (en) * 1996-12-24 1999-08-03 Daewoo Electronics Co., Ltd. Method of tracing a contour of an object based on background information of the object
US6529630B1 (en) * 1998-03-02 2003-03-04 Fuji Photo Film Co., Ltd. Method and device for extracting principal image subjects
JP2000293687A (en) * 1999-02-02 2000-10-20 Minolta Co Ltd Three-dimensional shape data processor and three- dimensional shape data processing method
EP1107166A3 (en) * 1999-12-01 2008-08-06 Matsushita Electric Industrial Co., Ltd. Device and method for face image extraction, and recording medium having recorded program for the method
JP3760068B2 (en) * 1999-12-02 2006-03-29 本田技研工業株式会社 Image recognition device
JP2001175868A (en) * 1999-12-22 2001-06-29 Nec Corp Method and device for human detection
JP4409072B2 (en) * 2000-09-14 2010-02-03 本田技研工業株式会社 Outline extraction device, outline extraction method, and recording medium recording outline extraction program
US6697502B2 (en) * 2000-12-14 2004-02-24 Eastman Kodak Company Image processing method for detecting human figures in a digital image
TW505892B (en) * 2001-05-25 2002-10-11 Ind Tech Res Inst System and method for promptly tracking multiple faces
AUPR541801A0 (en) * 2001-06-01 2001-06-28 Canon Kabushiki Kaisha Face detection in colour images with complex background
US6879709B2 (en) * 2002-01-17 2005-04-12 International Business Machines Corporation System and method for automatically detecting neutral expressionless faces in digital images
JP3996015B2 (en) * 2002-08-09 2007-10-24 本田技研工業株式会社 Posture recognition device and autonomous robot
JP3855939B2 (en) * 2003-01-31 2006-12-13 ソニー株式会社 Image processing apparatus, image processing method, and photographing apparatus
JP4231320B2 (en) * 2003-03-31 2009-02-25 本田技研工業株式会社 Moving body detection device
EP1477924B1 (en) * 2003-03-31 2007-05-02 HONDA MOTOR CO., Ltd. Gesture recognition apparatus, method and program
US7324693B2 (en) * 2003-04-23 2008-01-29 Eastman Kodak Company Method of human figure contour outlining in images
JP4317465B2 (en) * 2004-02-13 2009-08-19 本田技研工業株式会社 Face identification device, face identification method, and face identification program
US7224831B2 (en) * 2004-02-17 2007-05-29 Honda Motor Co. Method, apparatus and program for detecting an object
US20050196015A1 (en) * 2004-03-02 2005-09-08 Trw Automotive U.S. Llc Method and apparatus for tracking head candidate locations in an actuatable occupant restraining system
JP2005339363A (en) 2004-05-28 2005-12-08 Canon Inc Human body part automatic dividing device and human body part automatic dividing method
US20090041297A1 (en) * 2005-05-31 2009-02-12 Objectvideo, Inc. Human detection and tracking for security applications
US7689011B2 (en) * 2006-09-26 2010-03-30 Hewlett-Packard Development Company, L.P. Extracting features from face regions and auxiliary identification regions of images for person recognition and other applications

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7379591B2 (en) * 1999-07-29 2008-05-27 Fujiflim Corporation Method and device for extracting specified image subject
US20010005219A1 (en) * 1999-12-27 2001-06-28 Hideaki Matsuo Human tracking device, human tracking method and recording medium recording program thereof
US20060170769A1 (en) * 2005-01-31 2006-08-03 Jianpeng Zhou Human and object recognition in digital video

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100061658A1 (en) * 2008-09-08 2010-03-11 Hideshi Yamada Image processing apparatus, method, and program
US8204308B2 (en) * 2008-09-08 2012-06-19 Sony Corporation Image processing apparatus, method, and program
US20110075926A1 (en) * 2009-09-30 2011-03-31 Robinson Piramuthu Systems and methods for refinement of segmentation using spray-paint markup
US8670615B2 (en) * 2009-09-30 2014-03-11 Flashfoto, Inc. Refinement of segmentation markup
US20120093414A1 (en) * 2009-11-04 2012-04-19 Olaworks, Inc. Method, terminal device, and computer-readable recording medium for setting an initial value for a graph cut
US8437550B2 (en) * 2009-11-04 2013-05-07 Intel Corporation Method, terminal device, and computer-readable recording medium for setting an initial value for a graph cut
EP2498221A4 (en) * 2009-11-04 2015-02-11 Intel Corp METHOD, TERMINAL DEVICE AND COMPUTER-READABLE RECORDING MEDIUM FOR DEFINING INITIAL VALUE FOR GRAPHICAL PARTITIONING
US8611724B2 (en) 2010-06-28 2013-12-17 Brother Kogyo Kabushiki Kaisha Computer readable medium, information processing apparatus and method for processing moving image and sound
US20160196662A1 (en) * 2013-08-16 2016-07-07 Beijing Jingdong Shangke Information Technology Co., Ltd. Method and device for manufacturing virtual fitting model image
RU2632165C2 (en) * 2013-08-16 2017-10-02 Бэйцзин Цзиндун Шанкэ Информейшн Текнолоджи Ко, Лтд. Method and device for creating model image for virtual fitting
US20170157788A1 (en) * 2015-12-04 2017-06-08 Urschel Laboratories, Inc. Stripper plates, dicing machines that utilize stripper plates, and methods of use
CN109033945A (en) * 2018-06-07 2018-12-18 西安理工大学 A kind of human body contour outline extracting method based on deep learning
US20220253549A1 (en) * 2021-02-08 2022-08-11 Capital One Services, Llc Methods and systems for automatically preserving a user session on a public access shared computer
US11861041B2 (en) * 2021-02-08 2024-01-02 Capital One Services, Llc Methods and systems for automatically preserving a user session on a public access shared computer
US12189814B2 (en) 2021-02-08 2025-01-07 Capital One Services, Llc Methods and systems for automatically preserving a user session on a public access shared computer
CN114387335A (en) * 2021-11-30 2022-04-22 国网湖南省电力有限公司 Human body position estimation method based on monocular vision
CN117252903A (en) * 2023-11-10 2023-12-19 山东通广电子股份有限公司 Motion area extraction method and system based on image processing
CN120075634A (en) * 2023-11-22 2025-05-30 荣耀终端股份有限公司 Image blurring processing method and device

Also Published As

Publication number Publication date
JP2008009576A (en) 2008-01-17
JP4699298B2 (en) 2011-06-08
US8041081B2 (en) 2011-10-18

Similar Documents

Publication Publication Date Title
US8041081B2 (en) Method, apparatus, and program for human figure region extraction
US8023701B2 (en) Method, apparatus, and program for human figure region extraction
US12361701B2 (en) Image processing apparatus, training apparatus, image processing method, training method, and storage medium
US11120556B2 (en) Iterative method for salient foreground detection and multi-object segmentation
US7336819B2 (en) Detection of sky in digital color images
US8238660B2 (en) Hybrid graph model for unsupervised object segmentation
US7415165B2 (en) Red-eye detection device, red-eye detection method, and red-eye detection program
US8401292B2 (en) Identifying high saliency regions in digital images
CN110781836A (en) Human body recognition method and device, computer equipment and storage medium
US9317784B2 (en) Image processing apparatus, image processing method, and program
KR102320985B1 (en) Learning method and learning device for improving segmentation performance to be used for detecting road user events using double embedding configuration in multi-camera system and testing method and testing device using the same
JP6808783B2 (en) Image processing using artificial neural networks
US9418440B2 (en) Image segmenting apparatus and method
US20070047822A1 (en) Learning method for classifiers, apparatus, and program for discriminating targets
CN116740758A (en) A bird image recognition method and system to prevent misjudgment
JP2005190400A (en) Face image detection method, face image detection system, and face image detection program
US20110243426A1 (en) Method, apparatus, and program for generating classifiers
CN112989958A (en) Helmet wearing identification method based on YOLOv4 and significance detection
CN113191195B (en) Face detection method and system based on deep learning
EP3998577B1 (en) Object detection device, object detection method, and program
CN114724190A (en) Mood recognition method based on pet posture
CN115035390A (en) Aerial photography image detection method based on GAN and feature enhancement
KR20210089044A (en) Method of selecting training data for object detection and object detection device for detecting object using object detection model trained using method
CN110084190B (en) Real-time unstructured road detection method under severe illumination environment based on ANN
CN119152178A (en) Infrared small target detection optimization method and system based on data and characteristics

Legal Events

Date Code Title Description
AS Assignment

Owner name: FUJIFILM CORPORATION, JAPAN

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:HU, YI;REEL/FRAME:019539/0758

Effective date: 20070604

STCF Information on status: patent grant

Free format text: PATENTED CASE

FEPP Fee payment procedure

Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

FPAY Fee payment

Year of fee payment: 4

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 8

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 12TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1553); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 12