Summary of the invention
The object of the present invention is to provide a kind of people's face real-time detection method and device thereof, make the computation complexity of people's face testing process reduce significantly based on video, and the same accuracy that guarantees detection.
For solving the problems of the technologies described above, embodiment of the present invention provides a kind of people's face real-time detection method based on video, comprises following steps:
The frame that in advance video frame image is divided into full detection frame and two types of prediction detection frames; The full frame that detects is with fixing little step-length; Adopt the real-time face detection algorithm the whole zone of image to be carried out the picture frame of full searching and detecting; It is based on prediction that prediction detects frame, the surveyed area of picture frame is cut apart be people's face candidate region and non-face zone, and fixing little step-length is adopted in people's face candidate region; To the picture frame that non-face zone adopts fixing big step length searching to detect, fixing big step-length is greater than fixing little step-length;
If the video frame image of current input and nearest before one complete detect frame number between the frame at interval greater than predetermined threshold value, judge that then the video frame image of current input is the full frame that detects, detect with the detection mode of full detection frame; If the video frame image of current input and a nearest before full frame number that detects between the frame are less than or equal to predetermined threshold value at interval, judge that then the video frame image of current input detects frame for prediction, the detection mode that detects frame with prediction detects;
After the detection mode that detects frame with the detection mode of full detection frame or with prediction detects, the regional location of all detected people's faces in the output current video image frame;
Wherein, first video frame image is the full frame that detects.
Embodiment of the present invention also provides a kind of people's face real-time detection apparatus based on video, comprises video frame image load module, type judging module, detection module and human face region output module;
The video frame image load module is used for video frame image is input to type judging module;
Type judging module is used to judge the type of the video frame image of current input; The type of video frame image is divided into full detection frame and detects two kinds of frames with prediction; The full frame that detects be with fixing little step-length, adopts the real-time face detection algorithm that the picture frame of full searching and detecting is carried out in the whole zone of image, and prediction detection frame is based on prediction; The surveyed area of picture frame cut apart be people's face candidate region and non-face zone; Fixing little step-length is adopted in people's face candidate region, and to the picture frame that non-face zone adopts fixing big step length searching to detect, fixing big step-length is greater than fixing little step-length;
If the frame number between the video frame image of current input and the nearest before full detection frame is at interval greater than predetermined threshold value, then type judging module judges that the video frame image of current input is the full frame that detects; If the video frame image of current input and a nearest before full frame number that detects between the frame are less than or equal to predetermined threshold value at interval, then type judging module judges that the video frame image of current input detects frame for prediction; Wherein, first video frame image is the full frame that detects;
Detection module is used for the type judged according to type judging module, and the video frame image of current input is detected;
The human face region output module is used to export the regional location of the somebody of institute face in the detected current video image frame of detection module.
Embodiment of the present invention compared with prior art, the key distinction and effect thereof are:
Video frame image is divided into full detection frame (to the picture frame of the whole zone of image with the full searching and detecting of fixedly small step progress row) and predicts that detecting frame (adopts fixing little step-length to people's face candidate region; Picture frame to the fixing big step length searching detection of non-face zone employing) two types frame; According to the video frame image of current input and a nearest before full frame number interval of detecting between the frame; Type to the video frame image of current input is adjudicated; And adopt the relevant detection mode to detect according to the type of judgement, the regional location of all detected people's faces in the output current video image frame.Because video sequence possesses measurable characteristic in the time-space domain; Can be based on prediction; The surveyed area of picture frame cut apart be people's face candidate region and non-face zone, therefore the partial video picture frame is adopted fixing little step-length is adopted in people's face candidate region, the mode that adopts fixing big step length searching to detect to non-face zone; Can effectively reduce the number of times that needs search in the testing process; Avoided general, made the computation complexity of testing process reduce significantly, and can guarantee the accuracy of detection equally based on the defective of AdaBoost cascade algorithm with fixed step size search Zone Full.
Further, if do not detect people's face in the video frame image of a last input of the video frame image of current input, the video frame image of the current input of then same judgement is full detection frame, and detects with the detection mode of said full detection frame.Owing to do not detect under the situation of people's face at previous frame; Possibly there are certain error in the people's face candidate region and the non-face zone that surveyed area are divided into based on prediction; If therefore in previous frame, do not detect people's face; Be the full frame that detects then, can guarantee the accuracy that detects further the present frame judgement.
Further, the step-length of fixing big step-length is 2 times of the step-length of fixing little step-length.If the step-length of fixing big step-length is little with the step-length gap of fixing little step-length, then the reduction amplitude of the computation complexity of testing process is limited, if but the step-length of fixing big step-length is excessive with the step-length gap of fixing little step-length, then possibly cause detecting the decline of quality.Therefore, fixing big step-length is made as 2 times that fix little step-length, can gets a compromise preferably qualitatively in computation complexity and detection.
Further; After the detection mode that detects frame with the detection mode of full detection frame or with prediction detects the current video image frame; With the human face region position in the current video image frame is benchmark; This regional location is carried out the amplification of suitable multiple, and amplification multiple is to be 1.25 times, and surveyed area is cut apart adult face candidate region and non-face zone.Because handheld device mostly be embedded system, its computing power is poor, storage capacity a little less than, when therefore enlargement factor being made as 1.25 times (promptly remove 4 and take advantage of 5), can realize through being shifted, can guarantee the quick realization on embedded system platform.
Further, the people's face real-time detection method based on video of the present invention is applied in the people's face detection in real time in the handheld device.Because handheld device mostly is embedded system, its computing power is poor, a little less than the storage capacity, can not satisfy the demand of handheld device based on the method for detecting human face of single frames.And the people's face real-time detection method among the present invention greatly reduces computation complexity, and appearance can guarantee the accuracy that detects, thus make that computing power is poor, the handheld device a little less than the storage capacity also can be in real time, robust detect people's face.
Embodiment
In following narration, many ins and outs have been proposed in order to make the reader understand the application better.But, persons of ordinary skill in the art may appreciate that even without these ins and outs with based on the many variations and the modification of following each embodiment, also can realize each claim of the application technical scheme required for protection.
For making the object of the invention, technical scheme and advantage clearer, embodiment of the present invention is done to describe in detail further below in conjunction with accompanying drawing.
First embodiment of the invention relates to a kind of people's face real-time detection method based on video, can be applicable in the handheld device, like mobile device etc.In this embodiment; The frame that in advance video frame image is divided into full detection frame and two types of prediction detection frames; The full frame that detects be with fixing little step-length, adopts the real-time face detection algorithm that the picture frame of full searching and detecting is carried out in the whole zone of image, and prediction detection frame is based on prediction; The surveyed area of picture frame cut apart be people's face candidate region and non-face zone; Fixing little step-length is adopted in people's face candidate region, and to the picture frame that non-face zone adopts fixing big step length searching to detect, fixing big step-length is greater than fixing little step-length.
If the video frame image of current input and nearest before one complete detect frame number between the frame at interval greater than predetermined threshold value, judge that then the video frame image of current input is the full frame that detects, detect with the detection mode of full detection frame.Otherwise, judge that the video frame image of current input detects frame for prediction, the detection mode that detects frame with prediction detects.Wherein, first video frame image is the full frame that detects.
After the detection mode that detects frame with the detection mode of full detection frame or with prediction detects, the regional location of all detected people's faces in the output current video image frame.
Describe with a concrete realization flow below, in this flow process,, be defined as the predicting interval the video frame image of current input and a nearest before full frame number interval of detecting between the frame.
As shown in Figure 1, in step 101, the predicting interval is set to zero.
Then, in step 102, the inputted video image frame, the input in this step can be the data of camera collection, also can be the video file of storage.
Then, in step 103, judge whether the predicting interval is zero, if, then get into step 104, otherwise, step 105 got into;
Then, in step 104, judge that the current video image frame is the full frame that detects, and carries out people's face with the current video image frame with the detection mode of full detection frame and detects.Promptly with fixing little step-length, adopt the real-time face detection algorithm that full searching and detecting is carried out in the whole zone of image, concrete mode will be discussed in more detail in Fig. 2.
Then, in step 105, judge that the current video image frame detects frame for prediction, the current video image frame is carried out people's face and detects to predict the detection mode that detects frame.Promptly based on prediction; The surveyed area of picture frame cut apart be people's face candidate region and non-face zone; Fixing little step-length is adopted in people's face candidate region; To the picture frame that non-face zone adopts fixing big step length searching to detect, fixing big step-length is greater than fixing little step-length, and concrete mode will be set forth in Fig. 3 in detail.
Then, in step 106, will add up in the predicting interval;
Then, in step 107, judge whether following condition satisfies: the predicting interval reaches pre-set threshold or does not detect people's face.If satisfy (being to reach pre-set threshold the predicting interval or do not detect people's face), then will be reset to zero (promptly getting into step 108) predicting interval; If do not satisfy (being not reach pre-set threshold the predicting interval and do not detect people's face); Then get into step 109, the regional location of all detected people's faces in the output current image frame, and be benchmark with the human face region; The zone is carried out the amplification of suitable multiple; As an instance, enlargement factor is to be 1.25, and surveyed area is cut apart adult face candidate region and non-face zone.Because handheld device mostly be embedded system, its computing power is poor, storage capacity a little less than, when therefore enlargement factor being made as 1.25 times (promptly remove 4 and take advantage of 5), get final product through displacement, can fast realization on embedded system platform.In addition, be appreciated that in practical application, also can be made as other multiples, as 1.5 times, 2 times etc.
Then, in step 110, judge whether that all video frame images have all accomplished detection, if, then withdraw from people's face testing process, otherwise, get into step 102, carry out people's face testing process of next video frame image.
Be not difficult to find; In this embodiment; Video frame image is divided into full detection frame (to the picture frame of the whole zone of image with the full searching and detecting of fixedly small step progress row) and predicts that detecting frame (adopts fixing little step-length to people's face candidate region; The picture frame that adopts fixing big step length searching to detect to non-face zone) two types frame according to the video frame image of current input and a nearest before full frame number interval of detecting between the frame, is adjudicated the type of the video frame image of current input; And adopt the relevant detection mode to detect according to the type of judgement, the regional location of all detected people's faces in the output current video image frame.Because video sequence possesses measurable characteristic in the time-space domain; Can be based on prediction; The surveyed area of picture frame cut apart be people's face candidate region and non-face zone, therefore the partial video picture frame is adopted fixing little step-length is adopted in people's face candidate region, the mode that adopts fixing big step length searching to detect to non-face zone; Can effectively reduce the number of times that needs search in the testing process; Avoided general, made the computation complexity of testing process reduce significantly, and can guarantee the accuracy of detection equally based on the defective of AdaBoost cascade algorithm with fixed step size search Zone Full.Thereby the handheld device of make that computing power is poor, storage capacity is weak also can be in real time, robust detect people's face.In addition, be appreciated that the people's face real-time detection method based on video of this embodiment also can be applied in other equipment, carry out the equipment that people's face detects in real time like bench device or other various needs.
What deserves to be mentioned is that flow process as shown in Figure 1 just realizes the concrete scheme of this embodiment; In this flow process, carry out adding up of predicting interval after current image frame detected, be added to predetermined threshold value when the predicting interval; Perhaps when not detecting people's face; Automatically will be reset to zero the predicting interval, guarantee that with this a video frame image of current input and a nearest before full frame number that detects between the frame are no more than threshold value at interval, and previous frame detect people's face.But in practical application, other similar flow scheme design can also be arranged, such as; Behind the inputted video image frame, add up the predicting interval, whether judge the predicting interval greater than predetermined threshold value, if greater than predetermined threshold value; Then adopt full detection mode to detect, and will be reset to zero the predicting interval; If be less than or equal to predetermined threshold value, then further judge in the video frame image of a last input of video frame image of current input whether detected people's face, if detected people's face, judge again that then the video frame image of current input is prediction detection frame.If previous frame does not detect people's face, judge that then the video frame image of current input detects frame for complete, and detect with the detection mode of full detection frame.
Owing to do not detect under the situation of people's face at previous frame; Possibly there are certain error in the people's face candidate region and the non-face zone that surveyed area are divided into based on prediction; If therefore in previous frame, do not detect people's face; Be the full frame that detects then, can guarantee the accuracy that detects further the present frame judgement.
But in practical application, also can be when the predicting interval be less than or equal to predetermined threshold value, the video frame image of directly judging current input does not carry out the judgement whether previous frame detects people's face for prediction detects frame.
Carrying out the step (being the step 104 among Fig. 1) that people's face detects in the face of the detection mode with full detection frame in this embodiment down specifies.
As shown in Figure 2, in step 201, set the vertical search step-length step_y of column direction, as an instance, this step-length is made as 2; To carry out horizontal direction scanning when the prostatitis band.
Then, in step 202, set the horizon scan step-length step_x of line direction, as an instance, this step-length is made as 2.
Then, in step 203, current window is carried out people's face testing process.
Then, in step 204 and 204 ', move to next window,, then get into step 203, otherwise the description line direction finishes, and gets into step 205 if this window exists by horizontal step-length step_x.
Then, in step 205 and 205 ', move to next test strip,, then get into step 202 if this band exists by vertical step-length step_y, otherwise, explain that the entire image frame detects end.
Carrying out the step (being the step 105 among Fig. 1) that people's face detects in the face of the detection mode that detects frame with prediction in this embodiment down specifies.
As shown in Figure 3, in step 301, set non-face regional detection window moving step length step_y1 of column direction and human face region detection window moving step length step_y2; As an instance, setting step_y1 is 4, and step_y2 is 2; To carry out horizontal direction scanning when the prostatitis band.
Then, in step 302, set non-face regional detection window moving step length step_x1 of line direction and human face region detection window moving step length step_x2; As an instance, setting step_x1 is 4, and step_x2 is 2.
Then, in step 303, current window is carried out people's face testing process.
Then, in step 304, judge that whether the current window line direction is at human face region; If the current window line direction at human face region, then gets into step 304 ', step_x2 moves to next window with the line direction step-length; Otherwise get into step 304 ", step_x1 moves to next window with step-length.
Then, in step 305, judge that whether the line direction detection finishes, and judges promptly whether this window exists.If this window exists, then the description line direction does not finish, and gets into step 303, otherwise the description line direction finishes, and gets into step 306.
In step 306 to step 307; Judge that whether the current window column direction is at human face region; If the current window column direction at human face region, then moves to next test strip with column direction step-length step_y2, otherwise moves to next band with step-length step_y1.If this band exists, explain that then column direction detects end as yet, get into step 302, otherwise, explain that the detection of entire image frame finishes.
Be not difficult to find that in the present embodiment, the step-length of fixing big step-length is 4, the step-length of fixing little step-length is 2.Certainly, in practical application, also can with fix big step-length, fixing little step-length is made as other numerical value, is 2 like the step-length of fixing big step-length, the step-length of fixing little step-length is 1 etc.In this embodiment; 2 times the reason that fixing big step-length is made as fixing little step-length is; If the step-length of fixing big step-length is little with the step-length gap of fixing little step-length; Then the reduction amplitude of the computation complexity of testing process is limited, if but the step-length of fixing big step-length is excessive with the step-length gap of fixing little step-length, then possibly cause detecting the decline of quality.Therefore, fixing big step-length is made as 2 times that fix little step-length, can gets a compromise preferably qualitatively in computation complexity and detection.In addition, be appreciated that certainly in practical application, also can be made as other multiples, as 1.5 times, 2.5 times, 4 times etc.
Down in the face of current window carried out people's face testing process (being step 203,303) specify among Fig. 2, Fig. 3.
As shown in Figure 4, in step 401, the detected parameters of current detection window is carried out the normalization process, avoid because the illumination condition difference impacts detecting device.
Then, in step 402, the detector stage counting number is changed to zero.
Then, in step 403, if the detector stage counting number then calculates current progression less than total detecting device progression; Otherwise, detect people's face, get into step 412, people's face testing process is withdrawed from the position of output people face.
Then, in step 404, the accumulative total of current detection device progression and be changed to zero.
Then, in step 405, the Weak Classifier of current detection device progression counting is changed to zero.
Then; In step 406; Whether the Weak Classifier counting of judging current detection device progression is less than the Weak Classifier number of total current detection device progression, if the Weak Classifier of current detection device progression counting then gets into step 407 less than the Weak Classifier number of total current detection device progression; Otherwise get into step 410.
Then, in step 407 and 407 ', Weak Classifier is carried out eigenvalue calculation, and the judging characteristic value is whether less than the threshold value of current Weak Classifier.If eigenwert less than the threshold value of current Weak Classifier, then gets into step 408; Otherwise get into step 409.
In step 408, the accumulative total of current detection device progression and the alpha value of the Weak Classifier that adds up.
In step 409, the Weak Classifier counting adds up, and gets into step 406.
In step 410, judge the accumulative total of current detection device progression and whether less than threshold value, if the accumulative total of current detection device progression and less than threshold value then withdraws from people's face testing process, if the accumulative total of current detection device progression and more than or equal to threshold value then gets into step 411.
Then, in step 411, the counting of detecting device progression adds up, and gets into step 403.
Need to prove that Fig. 2, Fig. 3, flow process shown in Figure 4 are the ins and outs that realize this embodiment, in practical application, also can design other flow processs and realize that this does not give unnecessary details one by one again.
Each method embodiment of the present invention all can be realized with modes such as software, hardware, firmwares.No matter the present invention be with software, hardware, or the firmware mode realize; Instruction code can be stored in the storer of computer-accessible of any kind (for example permanent or revisable; Volatibility or non-volatile; Solid-state or non-solid-state, fixing perhaps removable medium or the like).Equally; Storer can for example be programmable logic array (Programmable Array Logic; Abbreviation " PAL "), RAS (Random Access Memory; Abbreviation " RAM "), programmable read only memory (Programmable Read Only Memory is called for short " PROM "), ROM (read-only memory) (Read-Only Memory is called for short " ROM "), Electrically Erasable Read Only Memory (Electrically Erasable Programmable ROM; Abbreviation " EEPROM "), disk, CD, digital versatile disc (Digital Versatile Disc is called for short " DVD ") or the like.
Second embodiment of the invention relates to a kind of people's face real-time detection apparatus based on video, is applicable in the handheld device.Should comprise video frame image load module, type judging module, detection module and human face region output module based on people's face real-time detection apparatus of video, as shown in Figure 5.
The video frame image load module is used for video frame image is input to type judging module.
Type judging module is used to judge the type of the video frame image of current input; The type of video frame image is divided into full detection frame and detects two kinds of frames with prediction; The full frame that detects be with fixing little step-length, adopts the real-time face detection algorithm that the picture frame of full searching and detecting is carried out in the whole zone of image, and prediction detection frame is based on prediction; The surveyed area of picture frame cut apart be people's face candidate region and non-face zone; Fixing little step-length is adopted in people's face candidate region, and to the picture frame that non-face zone adopts fixing big step length searching to detect, fixing big step-length is greater than fixing little step-length.The step-length of fixing big step-length can be made as 2 times of step-length of fixing little step-length.
If the frame number between the video frame image of current input and the nearest before full detection frame is at interval greater than predetermined threshold value, then type judging module judges that the video frame image of current input is the full frame that detects.If the video frame image of current input and a nearest before full frame number that detects between the frame are less than or equal to predetermined threshold value at interval, then type judging module judges that the video frame image of current input detects frame for prediction.Wherein, first video frame image is the full frame that detects.
Detection module is used for the type judged according to type judging module, and the video frame image of current input is detected.After the detection mode that detects frame with the detection mode of full detection frame or with prediction detects the current video image frame; With the human face region position in the current video image frame is benchmark; This regional location is carried out the amplification of suitable multiple; As amplify 1.25 times, surveyed area is cut apart adult face candidate region and non-face zone.
The human face region output module is used to export the regional location of the somebody of institute face in the detected current video image frame of detection module.
Wherein, type judging module is used for also judging whether the video frame image of a last input of the video frame image of current input has detected people's face.
If the video frame image of current input and a nearest before full frame number that detects between the frame are less than or equal to predetermined threshold value at interval; Then type judging module is before the video frame image of judging current input detects frame for prediction; Judge earlier in the video frame image of a last input of video frame image of current input and whether detected people's face; If detected people's face, judge again that then the video frame image of current input detects frame for prediction.If do not detect people's face, then type judging module judges that the video frame image of current input is the full frame that detects.
Be not difficult to find that first embodiment is and the corresponding method embodiment of this embodiment, this embodiment can with the enforcement of working in coordination of first embodiment.The correlation technique details of mentioning in first embodiment is still effective in this embodiment, in order to reduce repetition, repeats no more here.Correspondingly, the correlation technique details of mentioning in this embodiment also can be applicable in first embodiment.
Need to prove; Each unit of mentioning in each equipment embodiment of the present invention all is a logical block, and physically, a logical block can be a physical location; It also can be the part of a physical location; Can also realize that the physics realization mode of these logical blocks itself is not most important with the combination of a plurality of physical locations, the combination of the function that these logical blocks realized is the key that just solves technical matters proposed by the invention.In addition, for outstanding innovation part of the present invention, above-mentioned each the equipment embodiment of the present invention will not introduced with solving the not too close unit of technical matters relation proposed by the invention, and this does not show that there is not other unit in the said equipment embodiment.
Though through reference some preferred implementation of the present invention; The present invention is illustrated and describes; But those of ordinary skill in the art should be understood that and can do various changes to it in form with on the details, and without departing from the spirit and scope of the present invention.