CN115457592B

CN115457592B - Pedestrian recognition method and device

Info

Publication number: CN115457592B
Application number: CN202110644569.1A
Authority: CN
Inventors: 吴扬峰
Original assignee: China Mobile Communications Group Co Ltd; China Mobile Xiongan ICT Co Ltd; China Mobile System Integration Co Ltd
Current assignee: China Mobile Communications Group Co Ltd; China Mobile Xiongan ICT Co Ltd; China Mobile System Integration Co Ltd
Priority date: 2021-06-09
Filing date: 2021-06-09
Publication date: 2025-08-22
Anticipated expiration: 2041-06-09
Also published as: CN115457592A

Abstract

The invention provides a pedestrian recognition method and device. The method comprises the steps of obtaining a predicted track of a target person based on a target candidate sequence and a historical position sequence after determining that tracking identification (trackID) of the target person is lost in a tracking process of a frame image in a target video, determining that the target person is blocked or leaves a boundary, performing different processing on the target candidate sequence based on different states of the target person, and identifying the target person by adopting an REID algorithm and face recognition based on the target candidate sequence in the trackID loss period. According to the invention, the pedestrian motion trail is predicted during the loss of the pedestrian tracking identification track ID in the video frame image, so that the processing of the target candidate sequence is realized, the stability and the continuity of target tracking in a shielding state are improved, and the video stitching technology is adopted in a state of leaving a boundary, so that the correctness of the pedestrian motion trail is verified, and the accuracy of pedestrian identification is improved.

Description

Pedestrian recognition method and device

Technical Field

The invention relates to the technical field of Internet technology application, in particular to a pedestrian recognition method and device.

Background

With the construction of modern cities, the coverage rate of the monitoring cameras is increased year by year, the positions of related applications of images and videos in daily life of people are increasingly highlighted, and the realization of searching by lost people and tracking by suspicious people is greatly facilitated. However, the workload of finding the target person from the monitoring video by manpower is huge, and the condition of missing is easy to occur.

At present, in the fields of monitoring, security protection and the like, a face recognition mode is mainly adopted to search and track personnel, and in a monitoring video, images are limited by the resolution and shooting angle of a camera, so that high-quality face pictures cannot be obtained. Pedestrian re-identification (REID) technology has been increasingly associated with people's daily lives in recent years. Pedestrian re-recognition refers to the problem of retrieving the same pedestrian image across cameras given the pedestrian image taken by a certain camera. In practical applications, pedestrian re-recognition is often in a mutually-substituted relationship with face recognition. Face recognition relies on the face features captured for recognition and matching, in a real scene, face images with good quality are difficult to capture, the scheme is uncertain in the success rate of searching for people, meanwhile, pedestrian re-recognition can only complete simple motion trail drawing, and pedestrian motion trail under each monitoring camera cannot be connected in series, so that potential intention of a target is analyzed.

Disclosure of Invention

Aiming at the problems existing in the prior art, the invention provides a pedestrian recognition method and device.

In a first aspect, the present invention provides a method for pedestrian recognition, comprising:

After determining that tracking identification (trackID) of a target person is lost in the tracking process of a frame image in a target video, acquiring a predicted track of the target person based on a target candidate sequence and a historical position sequence;

determining that the target person is blocked or leaves the boundary based on the predicted track of the target person;

if the target person is blocked, correcting and supplementing the target candidate sequence based on the predicted track of the target person;

If the target person leaves the boundary, splicing videos with the same spatial relationship with the predicted starting time, and acquiring a tracking identifier (trackID) of the pedestrian based on the spliced video frame image to generate a target candidate sequence;

Identifying target personnel by adopting REID algorithm and face recognition based on target candidate sequences in the loss period of the trackID;

the target candidate sequence is obtained by acquiring tracking identification (trackID) of pedestrians based on frame images in a target video, and comprises the trackID of each pedestrian and a boundary box;

The historical position sequence is obtained based on the position information of the target person in a preset number of continuous frame images in the target video after the target person is determined.

In one embodiment, the method for obtaining the tracking identification (trackID) of the pedestrian comprises the following steps:

Detecting pedestrians on frame images in the target video, and determining a trackID and a boundary box of each pedestrian;

And tracking the pedestrians, determining the similarity between the pedestrians in the frame images at adjacent time, associating the pedestrians meeting the threshold value of the similarity, and distributing the same trackID.

In one embodiment, the obtaining the predicted track of the target person based on the target candidate sequence and the historical position sequence includes:

Performing track prediction of target personnel by combining the target candidate sequence and the historical position sequence based on a starNet algorithm of global information interaction;

Wherein the trajectory prediction of the target person includes location information.

In one embodiment, if the target person is blocked, before correcting and supplementing the target candidate sequence based on the predicted track of the target person, the method further includes:

determining an occlusion state based on position information of a predicted track of a target person and pedestrian detection information of a current frame image, wherein the occlusion state comprises that the occlusion state is serious and the occlusion state is not serious;

if the shielding state is serious, predicting the boundary frame of the target person at the current moment based on the position information of the target person at the current moment and the boundary frame of the target person at the previous moment in the prediction track;

if the shielding state is not serious, predicting the boundary box of the target person at the current moment based on the position information of the target person at the current moment in the prediction track and the boundary box of the target person at the current moment and the previous moment in the actual frame image.

In an embodiment, if the occlusion state is severe, predicting the bounding box of the target person at the current time based on the position information of the target person at the current time and the bounding box of the target person at the previous time in the predicted trajectory includes:

Determining a first width and a first height according to a boundary box of a target person at a previous moment;

predicting a boundary box of the target person at the current moment based on the position information of the target person at the current moment in the predicted track, wherein the first width and the first height are the same as those of the target person at the current moment;

if the shielding state is not serious, predicting the boundary frame of the target person at the current moment based on the position information of the target person at the current moment in the prediction track and the boundary frame of the target person at the current moment and the previous moment in the actual frame image, wherein the method comprises the following steps:

Determining a first width, a first height and a first aspect ratio according to the bounding box of the target person at the previous moment;

determining a second width, a second height and a second aspect ratio according to the bounding box of the target person at the current moment;

and predicting the bounding box of the target person at the current moment based on the first aspect ratio and the second aspect ratio and the position information of the target person at the current moment in the predicted track.

In one embodiment, if the target person is blocked, correcting and supplementing the target candidate sequence based on the predicted track of the target person includes:

correcting the target candidate sequence based on the position information of the predicted track of the target person and combining the position information of pedestrian detection of the actual frame image;

verifying the corrected target candidate sequence by adopting an REID algorithm and a historical position sequence;

determining a correct predicted track of the target person based on the verified target candidate sequence;

And supplementing the target candidate sequence based on the correct predicted track of the target person and the boundary box of the predicted target person.

In one embodiment, the identifying the target person based on the target candidate sequence during the loss of the trackID by using an REID algorithm and face recognition includes:

Acquiring a target query graph, determining target personnel in the target candidate sequence based on REID algorithm, face recognition and similarity ranking, and determining a historical position sequence of the target personnel;

and the similarity between the target query graph and the determined target person meets a preset threshold value, and the ranking is highest.

In a second aspect, the present invention also provides a pedestrian recognition device, including a memory, a transceiver, and a processor;

The device comprises a memory for storing a computer program, a transceiver for receiving and transmitting data under the control of the processor, and a processor for executing the computer program in the memory and realizing the following steps:

In a third aspect, the present invention also provides a pedestrian recognition apparatus, the apparatus comprising:

the prediction track module is used for acquiring a prediction track of a target person based on a target candidate sequence and a historical position sequence after determining that tracking identification (trackID) of the target person is lost in the tracking process of a frame image in the target video;

the state confirmation module is used for determining that the target person is blocked or leaves the boundary based on the predicted track of the target person;

The correction and supplementation module is used for correcting and supplementing the target candidate sequence based on the predicted track of the target personnel if the target personnel are blocked;

The generation module is used for splicing videos with the same spatial relationship with the prediction starting time and adjacent to each other if a target person leaves the boundary, acquiring a tracking identifier (trackID) of the pedestrian based on the spliced video frame image, and generating a target candidate sequence;

the identification module is used for identifying target personnel by adopting an REID algorithm and face recognition based on the target candidate sequence in the loss period of the trackID;

In a fourth aspect, the present invention also provides a processor-readable storage medium storing a computer program for causing the processor to perform the steps of the method of pedestrian recognition as described in the first aspect.

According to the pedestrian recognition method and device, after the tracking identification of the pedestrian in the video frame image is lost, the motion track of the pedestrian is predicted, so that the processing of the target candidate sequence during the loss of the tracking identification is realized, the stability and the continuity of the target tracking in a shielding state are improved, and the accuracy of the pedestrian recognition is improved by adopting a video stitching technology in a state of leaving a boundary.

Drawings

In order to more clearly illustrate the invention or the technical solutions of the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the invention, and other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a method of pedestrian recognition provided by the present invention;

FIG. 2 is a network architecture diagram for implementing REID using deep network technology;

FIG. 3 is a block diagram of a starNet network;

FIG. 4 is an overall flow chart of a method of pedestrian recognition provided by the present invention;

FIG. 5 is a schematic diagram of the apparatus for pedestrian recognition provided by the present invention;

Fig. 6 is a schematic structural view of a pedestrian recognition device provided by the invention.

Detailed Description

The term "and/or" in the present invention describes an association relationship of association objects, and indicates that three relationships may exist, for example, a and/or B may indicate that a exists alone, a and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship.

The term "plurality" in the present invention means two or more, and other adjectives are similar thereto.

The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Fig. 1 is a flow chart of a pedestrian recognition method provided by the invention. As shown in fig. 1, the method comprises the steps of:

Step 101, after determining that tracking identification (trackID) of a target person is lost in the tracking process of a frame image in a target video, acquiring a predicted track of the target person based on a target candidate sequence and a historical position sequence;

Specifically, the wide application of the monitoring camera makes the realization of identifying pedestrians and searching people easier, the captured face image can identify the pedestrian object more accurately, and in a real scene, the face image with good quality is difficult to capture, so that the success rate of searching people is not high. Or because of shielding, the corresponding person cannot be found beyond the monitoring range of a certain camera.

Therefore, the invention obtains the boundary box of each pedestrian in each frame image by acquiring the target video and detecting each frame image of the target video, and assigns a trackID to each pedestrian to identify each pedestrian in the frame image of the target video. The range of the boundary box of each pedestrian needs to completely cover the human body characteristics of the pedestrian object, and the representation forms of the boundary box can be various, so that the boundary box is unique. For example, the bounding box may be quadrilateral, may be circular, or may be other irregular. Taking a quadrangle as an example, the coordinate position (x 1, y1, x2, y2, x3, y3, x4, y 4) of each vertex, or the coordinate position (x 5, y5, x6, y 6) of the quadrangle to the vertex, or the coordinate of the center point of the quadrangle, the width and height of the quadrangle, and the like. The boundary box of a pedestrian is mainly represented by the form of coordinate positions of quadrangles to vertexes in the invention. The target candidate sequence is formed by the trackID of each pedestrian and the corresponding bounding box.

The target person is a pedestrian tentatively searching for the highest object similarity, and after the target person is determined, the position information of the target person in a preset number of continuous frame images is stored to form a history position sequence. The position information of the target person can be the coordinate position of the center of the target person or the boundary frame information of the target person, and the position information and the boundary frame information have equivalent functions and can identify the position of the target person in the corresponding video frame image. The preset number may be selected according to practical situations, and is not limited herein.

Under the condition that the pedestrian trackID is lost in the frame image in the target video, the motion trail of the target person needs to be predicted based on the target candidate sequence and the historical position sequence. Wherein the information of the predicted trajectory includes all the position information of the target person during the loss of the trackID. The position information is mainly the position coordinates of the target person in the video frame image.

102, Determining that the target person is blocked or leaves a boundary based on the predicted track of the target person;

Specifically, according to the predicted track of the target person, the position coordinates of the target person in the video frame image are obtained. If the position coordinates are beyond the boundary range of the frame images in the video, determining that the target person is away from the boundary, and if the position information is not beyond the boundary range of the frame images in the video, determining that the target person is blocked.

Step 103, correcting and supplementing the target candidate sequence based on the predicted track of the target person if the target person is blocked;

Specifically, if the target person is determined to be blocked, comparing the target person with the motion tracks of all pedestrians obtained by actual pedestrian detection and tracking according to the position coordinates in the predicted track of the target person, and correcting and supplementing the target candidate sequence.

104, If the target person leaves the boundary, splicing videos with the same spatial relationship with the predicted starting time, and acquiring a tracking identifier (trackID) of the pedestrian based on the spliced video frame image to generate a target candidate sequence;

Specifically, if it is determined that the target person leaves the boundary, that is, the position coordinate of the current moment of the target person exceeds the range of the frame image in the video, the video adjacent to the camera in the actual geospatial relation is spliced with the video file by the same predicted starting moment. And splicing by adopting Parallax-Robust Surveillance Video Stitching algorithm, wherein the algorithm detects whether a moving object passes through the splice in the splicing process, and updates the camera pixel mapping matrix if the moving object passes through the splice, so that the splicing effect is improved.

Step 105, identifying the target personnel by adopting an REID algorithm and face recognition based on a target candidate sequence in the loss period of the trackID;

Specifically, the invention adopts a deep network technology to realize REID, and the network structure is shown in figure 2. As shown in FIG. 2, resnet (residual 50) is taken as a basic network, a high-resolution characteristic image is obtained through up-sampling, then fusion characteristics are obtained through convolution operation, a BatchNorm (batch standardization) layer is added before the full connection layer, and convergence capacity of a training stage is increased.

The REID technology is adopted as the main technology, the face recognition is adopted as the auxiliary technology, the influence of the face snapshot quality on the target determination can be reduced, and meanwhile, the time correlation and the space correlation are fully utilized to identify the target personnel.

Optionally, the method for acquiring the tracking identifier trackID of the pedestrian includes:

Specifically, before target person searching, pedestrian detection and tracking are carried out on all persons in the video frame image, so as to form a target candidate sequence.

1) The pedestrian detection step is realized by using a deep convolution network, detects whether pedestrians exist in the frame image in real time and gives accurate positioning, namely, the boundary box of each pedestrian is determined, and a trackID is allocated to each new pedestrian;

The usual workflow of the tracking algorithm is as follows:

1.1 original frames of a given video;

1.2 running an object detector to obtain bounding boxes of all pedestrians in the video frame image;

1.3 for each detected pedestrian, different features, typically visual and movement features, are calculated;

1.4, determining the probability that two pedestrians belong to the same target by using a calculation formula of the similarity;

1.5 if the similarity of the two pedestrians is greater than a preset threshold, associating the two pedestrians, and simultaneously distributing the same trackID to the pedestrians.

Optionally, the obtaining the predicted track of the target person based on the target candidate sequence and the historical position sequence includes:

Specifically, in the process of tracking a pedestrian, if no associated object is found in N continuous frame images, it is primarily determined that the trackID of the pedestrian or the target person is lost, where the N value may be set according to an actual scene, and is not limited herein. When the trackID is lost, in order to determine whether the trackID is blocked or leaves the boundary, the motion trail of the target person needs to be predicted, if the motion trail is simply predicted by single person, the final prediction information has great error, so the human motion trail prediction is performed by adopting starNet algorithm based on global information interaction. The algorithm can form a static map by the positions of all the obstacles at each moment, and the static map becomes a dynamic map with time sequence information along with the change of time. The motion information of the obstacles in each area is recorded in the dynamic diagram, wherein the motion information is obtained by the influence of all the obstacles together, and is not formed by interaction of the obstacles in pairs independently. By means of the method of sharing the global interaction map and individual inquiry, the global interaction and the compression of the calculation cost can be achieved. starNet the network architecture is shown in fig. 3, and the starNet network is divided into a trajectory prediction network Host NetWork and a global timing interaction computing network Hub NetWork.

According to the pedestrian recognition method and device, after the tracking identification (trackID) of the pedestrian in the video frame image is lost, the pedestrian motion trail is predicted, the pedestrian position in a future period can be obtained through the pedestrian motion trail prediction, and whether the target person is blocked or leaves the boundary can be judged. Meanwhile, the algorithm can be used in stages by performing cross verification on the predicted track, the tracking result and the re-identification result, so that the consumption of computing resources is reduced, and the algorithm precision is improved.

Optionally, before correcting and supplementing the target candidate sequence based on the predicted track of the target person if the target person is blocked, the method further includes:

Specifically, according to the bounding box detected by the pedestrians in the current frame image, the position coordinates of all pedestrians are determined, and the position coordinates of each pedestrian are sequentially compared with the coordinates of the positions in the predicted track of the target person. If the distance between the two positions is smaller than the set threshold value, judging that the lost trackID appears in the current frame, namely the shielding state of the target person is not serious, and if the distance between the two positions is larger than the set threshold value, judging that the shielding state of the target person is serious.

Under the condition that the shielding state is serious, for example, in the stage of pedestrian detection, the target person is completely shielded by an automobile or other objects, and the corresponding boundary frame cannot be obtained, the boundary frame of the target person at the current moment is predicted based on the position information of the target person at the current moment in the prediction track and the boundary frame of the target person at the previous moment in the video frame image.

Under the condition that the shielding state is not serious, for example, in a pedestrian detection stage, partial human body characteristics of the target personnel are shielded, a corresponding boundary frame can be obtained, but the human body characteristics of the target personnel in the boundary frame are not completed, and the boundary frame of the target personnel at the current moment is predicted based on the position information of the target personnel at the current moment in the prediction track, and the boundary frame of the target personnel at the current moment and the previous moment in the actual frame image.

According to the pedestrian recognition method and device, after tracking identification (trackID) of a pedestrian in a video frame image is lost, the motion track of the pedestrian is predicted, the shielding state is determined according to the position information of the predicted track of a target person and the pedestrian detection information of the current frame image, and the boundary frames of the target person are predicted in different modes according to different shielding states, so that a precondition guarantee is provided for the next processing process of the target candidate sequence.

Optionally, if the shielding state is serious, predicting the bounding box of the target person at the current moment based on the position information of the target person at the current moment and the bounding box of the target person at the previous moment in the prediction track includes:

and predicting the boundary box of the target person at the current moment based on the position information of the target person at the current moment in the predicted track, wherein the first width and the first height are the same as those of the target person at the current moment.

Optionally, if the shielding state is not serious, predicting the bounding box of the target person at the current moment based on the position information of the target person at the current moment in the prediction track, and the bounding box of the target person at the current moment and the previous moment in the actual frame image, includes:

Specifically, according to the comparison of the position information obtained by the predicted track of the target person and the position information of the detection result of the current frame, calculating the distance between the position in the predicted track and the position of the detection result, if the distance is smaller than a set threshold value, judging that the position of the lost trackID in the current frame is the position where the shielding is not serious, otherwise, judging that the shielding is serious;

If the occlusion state of the target person is serious through track prediction, the aspect ratio of the boundary frame of the frame video at the previous moment and the predicted position coordinates of the target person are used for calculating the boundary frame of the predicted target person (see the formula 2 in detail), and if the occlusion state of the target person is not serious (the boundary frame of the covering part of the human body of the pedestrian can be detected by the pedestrian detection algorithm), the aspect ratio of the boundary frame detected by the image frame at the previous moment in the video, the width or height of the boundary frame of the pedestrian detected at the current moment and the center coordinates of the predicted target person are used for calculating the boundary frame of the predicted target person (see the formula 4 in detail).

Assuming that the bounding box of the previous moment of the target person is represented by box 1= (x 1, y1, x2, y 2) by the pedestrian detection algorithm, converting the bounding box into a form to obtain box 2= (x, y, w, h), wherein w is a first width, h is a first height, and the relation of the bounding box and the first height is represented by formula 1:

Assuming that the center coordinates of the predicted target person's position are (xc, yc), the bounding box 3= (x 3, y3, x4, y 4) of the predicted target person with serious occlusion is calculated as follows:

Under the condition that shielding is not serious, the bounding box of pedestrian detection at the current moment of the target personnel is box= (x 5, y5, x6, y 6), and is converted into a form to obtain box 0= (x, y, w, h), wherein w is a second width, h is a second height, and the corresponding relation is as follows:

The bounding box 4= (x 7, y7, x8, y 8) of the prediction target person is calculated as follows:

wherein, the For the first aspect ratio,Is the second aspect ratio.

Optionally, if the target person is blocked, correcting and supplementing the target candidate sequence based on the predicted track of the target person, including:

Specifically, in order to avoid the disorder of the tracking target caused by pedestrian shielding in the target tracking process, the method is adopted for solving the problems of tracking, track prediction and human body re-identification cross-validation. In the whole system operation, target personnel tracking is continuously carried out, for example, a track prediction algorithm predicts the motion track of a pedestrian in a future 3s video by using a 5s historical video frame, or predicts the motion track of the pedestrian in a future 10s video by using a 10s historical video frame, and the predicted track of the target personnel is compared with the track obtained by tracking the pedestrian, so that the target candidate sequence is corrected. When the situation that the trackID is lost occurs, determining the positions of the pedestrians, namely the center coordinates corresponding to each bounding box, based on the bounding boxes of the pedestrians obtained through actual pedestrian detection and tracking, comparing the coordinates of the positions in the predicted track of the target personnel with the center coordinates of the pedestrians detected in practice, and judging that the predicted track of the target personnel is the same as the track trend of the corresponding pedestrians if the distance between the positions of the target personnel with the preset number and the distance between the positions of the same pedestrian are smaller than the preset threshold value if the distance between the positions of the target personnel with the preset number and the positions of the same pedestrian are smaller than the preset threshold value. And correcting the target candidate sequence comprising the target personnel through the predicted track of the target personnel.

Based on the corrected target candidate sequence, an REID algorithm is adopted to identify the target candidate sequence and the person object in the target query graph, if the situation that the pedestrian in the target candidate sequence cannot be matched with the person object in the target query graph occurs, namely the similarity of the pedestrian in the target candidate sequence and the person object in the target query graph cannot reach a preset threshold value, when the person object in the target query graph cannot be identified, the position corresponding to the target person is found out from the historical position sequence, the boundary frame of the tracking target is determined through track prediction, the boundary frame of the person object in the target query graph is combined, the boundary frame of the tracking target is calculated through comparing the IOU (Intersection over Union and the intersection ratio) of the boundary frames of the pedestrian and the person object in the target query graph, and the large IOU is selected as the correct target person. And finishing verification of the corrected target candidate sequence.

After the corrected target candidate sequence is determined to be correct, predicting the human motion track by adopting starNet algorithm based on the target candidate sequence and the historical position sequence, and determining the predicted track of the target personnel;

and supplementing the target candidate sequence of the target person based on the position information of the predicted track and the boundary box of the target person. Through the cross-validation mode, the algorithm can be used in stages, so that the consumption of computing resources is reduced, meanwhile, errors caused by false detection of the algorithm can be reduced to the greatest extent, and the influence of scene complexity on the algorithm accuracy is reduced.

Optionally, the identifying the target person based on the target candidate sequence during the loss of the trackID by adopting an REID algorithm and face recognition includes:

Specifically, a target query graph is obtained, wherein the target query graph comprises the characteristic information of the human body and the human face of the person object to be searched. Based on the target candidate sequence, human body identification is carried out by adopting an REID algorithm, the similarity ranking of pedestrians in the target candidate sequence is obtained by comparing the characteristics of the person objects in the target query graph with the characteristics of pedestrians in the target candidate sequence, whether the similarity meets a preset threshold value is judged, if yes, the pedestrian with the highest ranking is temporarily used as a target person, and if not, no pedestrian matched with the person objects in the target query graph exists in the target candidate sequence. In the subsequent video frames, if the face is captured, face comparison is carried out, after the comparison of 5 pedestrians with the top similarity rank in the target candidate sequence and the character objects in the target query graph is completed, the target personnel is finally determined, and after the target personnel is determined, the coordinate positions of the target personnel in a plurality of continuous frames are stored as the historical position sequence.

The overall flow of the present invention will be described below. Fig. 4 is an overall flowchart of a pedestrian recognition method provided by the present invention, as shown in fig. 4:

Step 401, detecting and tracking pedestrians. In the detection stage of pedestrians, detecting all pedestrians through frame images in videos, distributing a trackID to each new pedestrian, determining a boundary box according to the human body range of each pedestrian, determining the probability that two objects belong to the same target through comparing the similarity of objects in different boundary boxes in the tracking stage of the pedestrians, associating the two objects with each other more than a preset threshold, distributing the same trackID, and generating a target candidate sequence.

And step 402, judging the target pedestrians. And (3) carrying out similarity calculation on the target query graph and pedestrians in the target candidate sequence by adopting an REID algorithm, and tentatively setting the highest ranking as the target person, namely the person object in the target query graph. In the subsequent video frames, if the face is captured, the pedestrians with the similarity ranking of 5 top are compared with the face image, face recognition is carried out, and finally the target person is determined. After the target person is determined, the coordinate positions of the target person in the continuous preset number of frame images are saved as a historical position sequence.

And 403, preliminarily judging that the shielding is out of range. If a certain trackID has no related object in N continuous frame images in the tracking process, judging that the trackID of the pedestrian or the target person is lost, wherein N can be set according to actual conditions.

And step 404, predicting the pedestrian track based on the global information interaction. The method comprises the steps of initially judging that the trackID of a target person is lost, determining a predicted track of the target person according to starNet algorithm based on the target candidate sequence obtained in step 401 and the historical position sequence obtained in step 402, wherein the predicted track comprises position information of the target person during the period of the trackID loss.

Step 405, predicting a human frame for the blocked target person. Judging whether the target person is blocked or leaves the boundary according to the predicted track of the target person, if the target person is blocked, further judging whether the blocking state is serious or not.

Under the condition of serious shielding, predicting the boundary box of the target person at the current moment according to the boundary box of the target person at the previous moment and the predicted center coordinate.

And under the condition that the shielding is not serious, predicting the boundary box of the target person at the current moment according to the boundary box of the target person at the previous moment, the boundary box of the current moment and the predicted center coordinate.

Step 406, tracking, predicting and re-identifying and cross-verifying. And correcting the target candidate sequence based on the position comparison between the predicted track of the target person and the track tracked by the actual pedestrian. And according to the corrected target candidate sequence, adopting an REID algorithm to identify the object in the target query graph again. If no matched object is found, the position information corresponding to the target person is found in the historical position sequence, the corresponding boundary frame is obtained through track prediction, the ratio of the intersection and the union of the boundary frame of the target person in the predicted track and the boundary frame of the actually tracked pedestrian is calculated, and the boundary frame with the larger ratio is selected as the target person.

Step 407, video stitching is performed for out-of-range. And judging that the position of the target person exceeds the range of the video according to the predicted track of the target person, and if the target person leaves the boundary, namely, out of range. And adopting Parallax-Robust Surveillance Video Stitching algorithm to splice video files of cameras with the same predicted starting time and adjacent spatial relations with the video, repeating the steps 401-402 based on the spliced video, and re-identifying the object in the target query graph.

Fig. 5 is a schematic structural diagram of a pedestrian recognition device according to the present invention, and as shown in fig. 5, the device for audio katon evaluation includes a memory 520, a transceiver 510, and a processor 500, where the processor 500 and the memory 520 may be physically separated.

A memory 520 for storing a computer program, and a transceiver 510 for transceiving data under the control of the processor 500.

In particular, the transceiver 510 is used to receive and transmit data under the control of the processor 500.

Wherein in fig. 5, a bus architecture may comprise any number of interconnected buses and bridges, and in particular one or more processors represented by processor 500 and various circuits of memory represented by memory 520, linked together. The bus architecture may also link together various other circuits such as peripheral devices, voltage regulators, power management circuits, etc., all as are well known in the art and, therefore, will not be described further herein. The bus interface provides an interface. The transceiver 510 may be a number of elements, including a transmitter and a receiver, providing a means for communicating with various other apparatus over transmission media, including wireless channels, wired channels, optical cables, and the like.

The processor 500 is responsible for managing the bus architecture and general processing, and the memory 520 may store data used by the processor 500 in performing operations.

The processor 500 may be a central processing unit (Central Processing Unit, CPU), application SPECIFIC INTEGRATED Circuit (ASIC), field programmable gate array (Field Programmable GATE ARRAY, FPGA), or complex programmable logic device (Complex Programmable Logic Device, CPLD), or the processor may employ a multi-core architecture.

The processor 500 is operable to perform any of the methods provided by the present invention in accordance with the obtained executable instructions by invoking a computer program stored in the memory 520, for example:

identifying the target personnel by adopting an REID algorithm and face recognition based on a target candidate sequence in the loss period of the trackID;

It should be noted that, the pedestrian recognition device provided by the present invention can implement all the method steps implemented by the pedestrian recognition method embodiment, and can achieve the same technical effects, and the same parts and beneficial effects as those of the method embodiment in the present embodiment are not described in detail herein.

Fig. 6 is a schematic structural view of a pedestrian recognition device provided by the invention. As shown in fig. 6, the apparatus includes:

the predicted track module 601 is configured to obtain, after determining that a tracking identifier trackID of a target person is lost in a tracking process of a frame image in a target video, a predicted track of the target person based on a target candidate sequence and a historical position sequence;

A state confirmation module 602, configured to determine that the target person is blocked or leaves the boundary based on the predicted trajectory of the target person;

The correction and supplementation module 603 is configured to correct and supplement the target candidate sequence based on the predicted track of the target person if the target person is blocked;

The generating module 604 is configured to splice videos with the same spatial relationship as the prediction starting time and adjacent to each other if the target person leaves the boundary, obtain a tracking identifier trackID of the pedestrian based on the spliced video frame image, and generate a target candidate sequence;

The identifying module 605 is configured to identify the target person by using an REID algorithm and face recognition based on a target candidate sequence during a loss period of the trackID;

The generating module 604 is further configured to detect pedestrians on a frame image in the target video, and determine a trackID and a bounding box of each pedestrian;

The predicted track module 601 is further configured to perform track prediction of the target person by combining the target candidate sequence and the historical position sequence based on a starNet algorithm of global information interaction;

The predicted track module 601 is further configured to determine an occlusion state based on position information of a predicted track of a target person and pedestrian detection information of a current frame image, where the occlusion state includes an occlusion state being serious and an occlusion state not being serious;

The correction supplementing module 603 is further configured to predict a bounding box of the target person at the current moment based on the position information of the target person at the current moment and the bounding box of the target person at the previous moment in the predicted track if the occlusion state is serious, where the correction supplementing module includes:

The correction supplementing module 603 is further configured to predict, if the occlusion state is not serious, a bounding box of the target person at the current time based on the position information of the target person at the current time in the prediction track, the bounding box of the target person at the current time and at the previous time in the actual frame image, where the correction supplementing module includes:

The correction and supplementation module 603 is further configured to correct and supplement the target candidate sequence based on the predicted trajectory of the target person if the target person is blocked, and includes:

The identifying module 605 is further configured to identify the target person by using the REID algorithm and face recognition based on the target candidate sequence during the loss period of the trackID, and includes:

It should be noted that the division of the units in the present invention is illustrative, and is merely a logic function division, and other division manners may be implemented in practice. In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a processor-readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (processor) to execute all or part of the steps of the method according to the embodiments of the present invention. The storage medium includes a U disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, an optical disk, or other various media capable of storing program codes.

It should be noted that, the device provided by the present invention can implement all the method steps implemented by the method embodiment and achieve the same technical effects, and the parts and beneficial effects that are the same as those of the method embodiment in the present embodiment are not described in detail herein.

In another aspect, the present invention also provides a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions which, when executed by a computer, are capable of performing the steps of the method of pedestrian recognition provided by the above methods, for example comprising:

the target candidate sequence is obtained by detecting and tracking pedestrians of frame images in a target video and comprises a trackID and a bounding box of each pedestrian;

In another aspect, the present invention also provides a processor readable storage medium storing a computer program for causing the processor to perform the pedestrian recognition method provided in the above embodiments, for example, including:

The processor-readable storage medium may be any available medium or data storage device that can be accessed by a processor, including, but not limited to, magnetic storage (e.g., floppy disks, hard disks, magnetic tape, magneto-optical disks (MOs), etc.), optical storage (e.g., CD, DVD, BD, HVD, etc.), and semiconductor storage (e.g., ROM, EPROM, EEPROM, non-volatile storage (NAND FLASH), solid State Disk (SSD)), etc.

It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, magnetic disk storage, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-executable instructions. These computer-executable instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These processor-executable instructions may also be stored in a processor-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the processor-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These processor-executable instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims

1. A method of pedestrian recognition, comprising:

After determining that tracking identification (trackID) of a target person is lost in the tracking process of a frame image in a target video, carrying out track prediction of the target person by combining a target candidate sequence and a historical position sequence based on a starNet algorithm of global information interaction, wherein the track prediction of the target person comprises position information;

2. The method of pedestrian recognition according to claim 1, wherein the method of acquiring the tracking identification trackID of the pedestrian includes:

3. The method for identifying pedestrians according to claim 1, wherein if the target person is blocked, before correcting and supplementing the target candidate sequence based on the predicted trajectory of the target person, further comprising:

4. The method for pedestrian recognition according to claim 3, wherein if the occlusion state is severe, predicting the bounding box of the target person at the current time based on the position information of the target person at the current time and the bounding box of the target person at the previous time in the predicted trajectory, includes:

5. A method of pedestrian recognition according to claim 3 wherein, if the target person is occluded, correcting and supplementing the target candidate sequence based on the predicted trajectory of the target person comprises:

6. The method for identifying pedestrians according to claim 1, wherein the identifying the target person based on the target candidate sequence during the loss of the trackID using the REID algorithm and face recognition includes:

7. An electronic device for pedestrian recognition comprises a memory, a transceiver and a processor;

After determining that tracking identification (trackID) of a target person is lost in the tracking process of a frame image in a target video, carrying out track prediction of the target person by combining the target candidate sequence and a historical position sequence based on a starNet algorithm of global information interaction, wherein the track prediction of the target person comprises position information;

8. An apparatus for pedestrian recognition, the apparatus comprising:

The prediction track module is used for carrying out track prediction on the target personnel on the basis of a starNet algorithm of global information interaction after determining that tracking identification (trackID) of the target personnel is lost in the tracking process of a frame image in the target video and combining the target candidate sequence and the historical position sequence, wherein the track prediction of the target personnel comprises position information;

The identification module is used for identifying the target personnel by adopting an REID algorithm and face recognition based on the target candidate sequence in the loss period of the trackID;

9. A processor-readable storage medium, characterized in that the processor-readable storage medium stores a computer program for causing the processor to execute the method of pedestrian recognition according to any one of claims 1 to 6.