CN114638817B

CN114638817B - Image segmentation method and device, electronic equipment and storage medium

Info

Publication number: CN114638817B
Application number: CN202210322862.0A
Authority: CN
Inventors: 陈如婷
Original assignee: Beijing Sensetime Technology Development Co Ltd
Current assignee: Beijing Sensetime Technology Development Co Ltd
Priority date: 2022-03-29
Filing date: 2022-03-29
Publication date: 2025-11-21
Anticipated expiration: 2042-03-29
Also published as: CN114638817A

Abstract

This disclosure relates to an image segmentation method, apparatus, electronic device, and storage medium. The method includes: acquiring motion parameters corresponding to each human bounding box in a first global image, the motion parameters representing the motion pattern or trend of the human bounding box; scaling each human bounding box according to the motion parameters to obtain a target bounding box corresponding to the human bounding box; determining a first target image based on the target bounding boxes corresponding to each human bounding box and the first global image; and performing image segmentation on the first target image to obtain the human segmentation result of the first global image. Embodiments of this disclosure improve the image segmentation effect when the human body is moving.

Description

Image segmentation method and device, electronic equipment and storage medium

Technical Field

The disclosure relates to the field of computer technology, and in particular, to an image segmentation method and device, an electronic device and a storage medium.

Background

Background segmentation is an important issue in the field of computer vision and smart home. Background segmentation models can be used in many fields. For example, in a home entertainment scene, when multiple persons interact, a portrait and a background can be segmented based on a background segmentation model, so that a monotonous background can be replaced. In background segmentation, a region of interest (e.g., a region including a portrait) is generally selected from a global image as a target image, and then the target image is subjected to image segmentation.

In the image segmentation process, the situation that the human image leaves the target image or the segmentation details of the human image are poor and the like may occur under the influence of the motion of the human body, so that the image segmentation effect is poor. Therefore, how to improve the image segmentation effect during the motion of the human body is a current urgent problem to be solved.

Disclosure of Invention

The disclosure provides an image segmentation method and device, an electronic device and a technical scheme of a storage medium.

According to one aspect of the disclosure, an image segmentation method is provided, which comprises the steps of obtaining a movement parameter corresponding to each human body frame in a first global image, wherein the movement parameter is used for representing a movement rule or a movement trend corresponding to the human body frame, scaling the human body frame according to the movement parameter aiming at each human body frame to obtain a target frame corresponding to the human body frame, determining a first target image according to the target frame corresponding to each human body frame and the first global image, and performing image segmentation on the first target image to obtain a human body segmentation result of the first global image.

The image segmentation method provided by the embodiment of the disclosure can be applied to image segmentation in a single scene and also can be applied to image segmentation in a multi-person scene. In the embodiment of the disclosure, the size of the target image can be adjusted in real time according to the movement rule or movement trend of the human body, so that the target image can keep up with the movement of the human body in time, the probability that the human image leaves the target image and the probability that the proportion of pixels occupied by the human image in the target image is too small are reduced in the movement process of the human body, and the image segmentation effect when the human body moves is effectively improved.

In one possible implementation manner, the obtaining of the movement parameters corresponding to each human frame in the first global image includes performing object detection on the first global image to obtain an object detection result of the first global image, where the object detection result is used to indicate a position of a human frame included in the first global image, determining, according to the object detection result of the first global image, scene information of the first global image and distance information corresponding to each human frame in the first global image, where the scene information is used to indicate whether the first global image is a global image in a single person scene or a global image in multiple person scenes, where the distance information is used to indicate a distance between a human body in the human frame and a first image acquisition device, and where the first image acquisition device is used to acquire, for each human frame, a movement parameter corresponding to the human frame according to the scene information, the distance information corresponding to the human frame and a first preset mapping relation, and where the first preset relation is used to indicate different distances of the human frames.

In the embodiment of the disclosure, the movement parameters corresponding to the human body frames in the first global image are obtained based on the scene information of the first global image and the distance information corresponding to the human body frames in the first global image, so that conditions are provided for scaling of the human body frames, and the image segmentation effect during human body movement is improved.

In one possible implementation manner, the method further comprises the steps of obtaining a first video, wherein the first video corresponds to a first scene, the first scene is a single person scene or a multi-person scene, the first video is used for recording movement conditions of a single person or a plurality of persons in a first movement range, the first movement range is away from a second image acquisition device, the second image acquisition device is used for acquiring the first video, limb tracking is conducted on a target person in the first video, positions of a human body frame corresponding to the target person in each frame image of the first video are obtained, movement speed, movement amplitude and second distance of the target person are determined according to positions of the human body frame corresponding to the target person in each frame image of the first video, the second distance is used for indicating the distance between the target person and a reference position of the first video, a first movement parameter is obtained according to the movement speed and the movement amplitude of the target person, and the first mapping is established based on the first scene, the second distance and the first movement parameter.

In the embodiment of the disclosure, the historical images are analyzed through limb tracking, so that the mapping relation between the positions of the scene and the target person and the movement parameters is obtained, and a basis is provided for determining the movement parameters corresponding to the human frame in the first global image.

In one possible implementation manner, the method further comprises the steps of setting a moving frequency of a person in a first video to be larger than a first frequency threshold value and a length of projection of the first moving range in the x-axis direction of a camera coordinate system of the second image acquisition device to be larger than a first moving threshold value when the first scene is a single person scene, setting a moving frequency of the person in the first video to be smaller than or equal to a second frequency threshold value and a length of projection of the first moving range in the x-axis direction of the camera coordinate system of the second image acquisition device to be smaller than or equal to a second moving threshold value when the first scene is a multiple person scene, wherein the second frequency threshold value is smaller than or equal to the first frequency threshold value and the second moving threshold value is smaller than or equal to the first moving threshold value.

In the embodiment of the disclosure, the movement rule of a person is attached to the actual scene by setting a larger movement range and movement frequency for a single person scene and setting a smaller movement range and movement frequency for a plurality of persons, so that the accuracy of the movement parameters corresponding to the human frame is improved, and the image segmentation effect during movement of the human body is improved.

In one possible implementation manner, the scaling the human body frame according to the movement parameter corresponding to the human body frame to obtain the target frame corresponding to the human body frame includes determining a scaling factor of the human body frame according to the movement parameter corresponding to the human body frame and a second preset mapping relation, and scaling the human body frame according to the scaling factor to obtain the target frame corresponding to the human body frame, wherein the second preset mapping relation is used for indicating scaling factors corresponding to different movement parameters.

In a possible implementation manner, the determining the first target image according to the target frames corresponding to the human frames and the first global image includes merging the target frames corresponding to the human frames to obtain a merged frame, and obtaining the first target image according to the merged frame and the first global image, wherein the first target image corresponds to a merged frame with the smallest area in the merged frames capable of covering the target frames corresponding to all the human frames.

In the embodiment of the disclosure, the first global image is acquired based on the merging frame with the smallest area in the merging frames capable of covering all target frames corresponding to the human body frames, so as to obtain the first target image. In this way, the possibility that the proportion of the portrait occupied by the pixels in the first target image is too low can be reduced, and the image segmentation effect is improved.

In one possible implementation manner, the image segmentation of the first target image to obtain a human body segmentation result of the first global image includes obtaining a second target image corresponding to a second global image, where the second global image is a previous frame image of the first global image in a video, the second target image represents a target image adopted when the human body segmentation result of the second global image is obtained, determining a movement amplitude of the first target image relative to the second target image, and performing image segmentation on the first target image to obtain a human body segmentation result of the first global image when the movement amplitude of the first target image relative to the second target image is greater than a first amplitude threshold.

Therefore, the stability of the segmentation effect is improved by keeping the position of the target image relatively stable.

In one possible implementation manner, the method further comprises performing image segmentation by adopting the second target image under the condition that the movement amplitude of the first target image relative to the second target image is smaller than or equal to the first amplitude threshold value, so as to obtain a human body segmentation result of the first global image.

In one possible implementation manner, the image segmentation is performed on the first target image to obtain a human body segmentation result of the first global image, and the method comprises the steps of obtaining a second target image corresponding to a second global image, wherein the second global image and the first global image belong to the same video, the second global image is a previous frame image of the first global image, the second target image represents a target image adopted when the human body segmentation result of the second global image is obtained, determining coverage rate of the first target image relative to the second target image, and performing image segmentation on the first target image to obtain the human body segmentation result of the first global image under the condition that the coverage rate is smaller than a second amplitude threshold value.

Thus, the target image can be updated in time, and the image segmentation effect is improved.

In one possible implementation, the method further includes:

And under the condition that the coverage rate is larger than or equal to the second amplitude threshold value, performing image segmentation by adopting the second target image to obtain a human body segmentation result of the first global image.

According to an aspect of the present disclosure, there is provided an image segmentation apparatus including:

The first acquisition module is used for acquiring movement parameters corresponding to each human body frame in the first global image, wherein the movement parameters are used for representing movement rules or movement trends corresponding to the human body frames;

The scaling module is used for scaling the human body frames according to the moving parameters acquired by the first acquisition module aiming at each human body frame to acquire target frames corresponding to the human body frames;

the first determining module is used for determining a first target image according to the target frames corresponding to the human frames obtained by scaling by the scaling module and the first global image;

The first segmentation module is used for carrying out image segmentation on the first target image determined by the first determination module to obtain a human body segmentation result of the first global image.

In one possible implementation manner, the first obtaining module is further configured to:

performing target detection on the first global image to obtain a target detection result of the first global image, wherein the target detection result is used for indicating the position of a human frame included in the first global image;

Determining scene information of the first global image and distance information corresponding to each human frame in the first global image according to a target detection result of the first global image, wherein the scene information is used for indicating whether the first global image is a global image in a single person scene or a global image in a multiple person scene, the distance information is used for indicating the distance between a human body in the human frame and a first image acquisition device, and the first image acquisition device is used for acquiring the first global image;

And aiming at each human body frame, obtaining the movement parameters corresponding to the human body frame according to the scene information, the distance information corresponding to the human body frame and a first preset mapping relation, wherein the first preset mapping relation is used for indicating the movement parameters corresponding to the human body frames with different distances under different scenes.

In one possible implementation, the apparatus further includes:

the second acquisition module is used for acquiring a first video, wherein the first video corresponds to a first scene, the first scene is a single person scene or a multi-person scene, the first video is used for recording the movement condition of a single person or a plurality of persons in a first movement range, the first movement range is a first distance away from the second image acquisition equipment, and the second image acquisition equipment is used for acquiring the first video;

The tracking module is used for carrying out limb tracking on the target person in the first video to obtain the position of a human frame corresponding to the target person in each frame of image of the first video;

The second determining module is used for determining the moving speed, the moving amplitude and a second distance of the target person according to the positions of the human frames corresponding to the target person in each frame image of the first video, wherein the second distance is used for indicating the distance between the target person and the reference position of the first video;

the third acquisition module is used for acquiring a first movement parameter according to the movement speed and the movement amplitude of the target person;

the establishing module is configured to establish the first preset mapping relationship based on the first scene, the second distance, and the first movement parameter.

In one possible implementation, the apparatus further includes:

the first setting module is used for setting that the moving frequency of a person in a first video is larger than a first frequency threshold value and the length of projection of the first moving range in the x-axis direction of a camera coordinate system of the second image acquisition device is larger than a first moving threshold value when the first scene is a single person scene;

the second setting module is used for setting that the moving frequency of the person in the first video is smaller than or equal to a second frequency threshold value and the length of the projection of the first moving range in the x-axis direction of the camera coordinate system of the second image acquisition device is smaller than or equal to a second moving threshold value under the condition that the first scene is a multi-person scene;

wherein the second frequency threshold is less than or equal to the first frequency threshold and the second movement threshold is less than or equal to the first movement threshold.

In one possible implementation, the scaling module is further configured to:

determining a scaling factor of the human body frame according to the movement parameter corresponding to the human body frame and a second preset mapping relation;

and scaling the human body frame according to the scaling coefficient to obtain a target frame corresponding to the human body frame, wherein the second preset mapping relation is used for indicating scaling coefficients corresponding to different movement parameters.

In one possible implementation manner, the first determining module is further configured to:

combining the target frames corresponding to the human frames to obtain combined frames;

Obtaining the first target image according to the merging frame and the first global image;

the first target image corresponds to a merging frame with the smallest area in merging frames capable of covering all target frames corresponding to human body frames.

In one possible implementation, the first segmentation module is further configured to:

Acquiring a second target image corresponding to a second global image, wherein the second global image is a previous frame image of the first global image in a video, and the second target image represents a target image adopted when a human body segmentation result of the second global image is acquired;

Determining a movement amplitude of the first target image relative to the second target image;

and under the condition that the moving amplitude of the first target image relative to the second target image is larger than a first amplitude threshold, image segmentation is carried out on the first target image, and a human body segmentation result of the first global image is obtained.

In one possible implementation, the apparatus further includes:

And the second segmentation module is used for carrying out image segmentation by adopting the second target image under the condition that the movement amplitude of the first target image relative to the second target image is smaller than or equal to the first amplitude threshold value, so as to obtain the human body segmentation result of the first global image.

acquiring a second target image corresponding to a second global image, wherein the second global image and the first global image belong to the same video, the second global image is a previous frame image of the first global image, and the second target image represents a target image adopted when a human body segmentation result of the second global image is acquired;

Determining a coverage of the first target image relative to the second target image;

and under the condition that the coverage rate is smaller than a second amplitude threshold value, performing image segmentation on the first target image to obtain a human body segmentation result of the first global image.

In one possible implementation, the apparatus further includes:

And the third segmentation module is used for carrying out image segmentation by adopting the second target image under the condition that the coverage rate is larger than or equal to the second amplitude threshold value to obtain a human body segmentation result of the first global image.

According to an aspect of the disclosure, there is provided an electronic device comprising a processor, a memory for storing processor-executable instructions, wherein the processor is configured to invoke the instructions stored in the memory to perform the above method.

According to an aspect of the present disclosure, there is provided a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the above-described method.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure. Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the technical aspects of the disclosure.

FIG. 1 illustrates a flow chart of an image segmentation method according to an embodiment of the present disclosure;

FIG. 2 illustrates an exemplary schematic diagram of a human body frame and a target frame in an embodiment of the present disclosure;

FIG. 3 illustrates an exemplary schematic diagram of a target box and merge box in an embodiment of the disclosure;

Fig. 4 shows a block diagram of an image segmentation apparatus according to an embodiment of the disclosure;

fig. 5 illustrates a block diagram of an electronic device 800, according to an embodiment of the disclosure;

Fig. 6 illustrates a block diagram of an electronic device 1900 according to an embodiment of the disclosure.

Detailed Description

Various exemplary embodiments, features and aspects of the disclosure will be described in detail below with reference to the drawings. In the drawings, like reference numbers indicate identical or functionally similar elements. Although various aspects of the embodiments are illustrated in the accompanying drawings, the drawings are not necessarily drawn to scale unless specifically indicated.

The word "exemplary" is used herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.

The term "and/or" is merely an association relationship describing the associated object, and means that three relationships may exist, for example, a and/or B may mean that a exists alone, while a and B exist together, and B exists alone. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of A, B, C, may mean including any one or more elements selected from the group consisting of A, B and C.

Furthermore, numerous specific details are set forth in the following detailed description in order to provide a better understanding of the present disclosure. It will be understood by those skilled in the art that the present disclosure may be practiced without some of these specific details. In some instances, methods, means, elements, and circuits well known to those skilled in the art have not been described in detail in order not to obscure the present disclosure.

In the scenes of home entertainment, singing with wheat, surfing lessons, online meeting and the like, a background matched with the scene needs to be set so as to promote substitution sense, so that the human body and the background are segmented on the global image acquired by image acquisition equipment (such as equipment with photographing or video recording functions, such as cameras, video cameras, mobile phones, tablets and the like). When the segmentation is performed, a target image containing a portrait (i.e. an image corresponding to a human body) is generally selected from a global image, and then the target image is input into a background segmentation model to perform segmentation of the portrait and the background image. The segmentation effect of the target image is related to the proportion of pixels occupied by the portrait in the target image. If the selected target image is too small, the situation that the human image leaves the target image when the human body moves easily occurs, and if the selected target image is too large, the situation that the foreground segmentation detail is poor due to the fact that the proportion of the human image occupied by the pixels in the target image is too low easily occurs.

Fig. 1 shows a flowchart of an image segmentation method according to an embodiment of the present disclosure. The image segmentation method may be performed by an electronic device such as a terminal device or a server, where the terminal device may be a User Equipment (UE), a mobile device, a User terminal, a cellular phone, a cordless phone, a Personal digital assistant (Personal DIGITAL ASSISTANT, PDA), a handheld device, a computing device, a vehicle-mounted device, a wearable device, etc., and the method may be implemented by a processor invoking computer readable instructions stored in a memory. Or the method may be performed by a server. As shown in fig. 1, the image segmentation method includes:

In step S11, a movement parameter corresponding to each human frame in the first global image is acquired.

A global image is an image that contains a person. The global image may include one person or multiple persons, which is not limited by the embodiments of the present disclosure. The global image may be obtained after the image capturing device captures an image of a person in a certain spatial range, or may be an image frame including a person obtained from a video, or may be obtained by other means, which is not specifically limited in the embodiment of the present disclosure.

The first global image in step S11 may be used to represent the global image currently to be subjected to image segmentation. In the embodiment of the disclosure, after the first global image is input into the target detection model in the related art, the target detection model may output a target detection result of the first global image. The target detection result may be used to indicate a position of a human frame included in the first global image. The target detection model may be a convolutional neural network model, and the structure and the training process of the target detection model are not limited in the embodiment of the disclosure.

It can be understood that the human body contained in the first global image corresponds to the human body frame indicated by the target detection result one by one. Therefore, when a human body is included in the first global image, a movement parameter corresponding to a human body frame may be acquired in step S11. When the first global image includes a plurality of human bodies, in step S11, movement parameters corresponding to a plurality of human body frames may be acquired.

The movement parameters corresponding to the human body frame can be used for representing the movement rule or movement trend corresponding to the human body frame. The movement rule corresponding to the human body frame can be used for reflecting the movement condition of the human body based on the historical image analysis. The movement trend corresponding to the human body frame can be used for reflecting possible future movement conditions of the human body.

In one possible implementation, the movement parameters of the human frame include, but are not limited to, parameters for reflecting the movement of the human body, such as a movement speed, a movement amplitude, and a movement direction of the human frame. The specific process of obtaining the movement parameters corresponding to each human frame will be described in detail later in connection with possible implementation manners of the embodiments of the present disclosure, which will not be described herein.

In step S12, for each human frame, scaling the human frame according to the movement parameter to obtain a target frame corresponding to the human frame.

The movement of the human body relative to the image acquisition device may be broken down into a side-to-side movement and/or a back-and-forth movement. When a human body moves left and right relative to the image acquisition device, if the size of the target image remains unchanged, a situation that the human image leaves the target image may occur. When a human body approaches to the image acquisition device, the situation that the human body leaves the shooting view of the image acquisition device may occur, and if the size of the target image is kept unchanged, the situation that no portrait is contained in the target image or the contained portrait is incomplete may occur. When the human body is far away from the image acquisition equipment, if the size of the target image is kept unchanged, the situation that the proportion of pixels occupied by the portrait in the target image is too small can occur. The movement parameters considering the human body frame may represent the movement of the human body. Therefore, in the embodiment of the disclosure, the human body frame can be scaled according to the movement parameters corresponding to the human body frame, so that the target frame corresponding to the human body frame is obtained, on one hand, the possibility that the human body moves away from the target frame is reduced, and on the other hand, the possibility that the portrait occupies too small of the target frame is reduced.

Scaling the human frame in embodiments of the present disclosure includes contracting the human frame or expanding the human frame. The human body frame is expanded to obtain the target frame, so that the possibility that a human body moves out of the target frame can be reduced, the human body still stays in the target frame even if the human body moves correspondingly in a period of time, and the image segmentation effect is improved.

In one possible implementation manner, the step S12 of scaling the human frame according to the movement parameter corresponding to the human frame to obtain the target frame corresponding to the human frame includes determining a scaling factor of the human frame according to the movement parameter corresponding to the human frame and a second preset mapping relationship, and scaling the human frame according to the scaling factor to obtain the target frame corresponding to the human frame.

The second preset mapping relationship may be used to indicate scaling coefficients corresponding to different movement parameters. The second preset mapping relationship may be set as needed or empirically. For example, the larger the movement amplitude or the larger the movement speed, the larger the corresponding scaling factor. When the moving direction is far from the image acquisition equipment, the scaling coefficient is smaller than 1, and the larger the distance is, the smaller the scaling coefficient is, and when the moving direction is close to the image acquisition equipment, the scaling coefficient is larger than 1, and the larger the distance is, the larger the scaling coefficient is. Taking the moving direction as an example of being far away from the image acquisition equipment, namely, in the process that the human body gradually gets away from the image acquisition equipment, the ratio of the human image in the global image becomes smaller, the human body frame also needs to be correspondingly reduced, and as the human body moves from the near to the far away from the image acquisition equipment, the reduction amplitude of the human body frame also becomes larger, namely, the scaling factor for representing the reduction amplitude of the human body frame becomes smaller. Correspondingly, in the process that the moving direction is close to the image acquisition equipment, the occupation ratio of a human body in the global image is increased, the human body frame is enlarged, and the scaling factor is increased.

The image coordinate system of the global image takes the center of the global image as the origin of coordinates, the x-axis of the image coordinate system is parallel to the upper side and the lower side of the global image, and the y-axis of the image coordinate system is parallel to the left side and the right side of the global image. In one possible implementation, the scaling factor includes a scaling factor in an x-axis direction of an image coordinate system of the global image and a scaling factor in a y-axis direction of the image coordinate system of the global image. For example, in a single person scene, the human body moves more left and right, and moves less back and forth, that is, the human body frame moves more in the x-axis direction of the image coordinate system of the global image and less in the y-axis direction of the image coordinate system of the global image, so the scaling factor in the x-axis direction of the image coordinate system of the global image provided for the human body frame is larger and the scaling factor in the y-axis direction is smaller. In one example, the person is laterally jumping over the image capturing device when the scaling factor in the x-axis direction of the image coordinate system of the global image is 1.2 and the scaling factor in the y-axis direction is 1.0.

Fig. 2 illustrates an exemplary schematic diagram of a human body frame and a target frame in an embodiment of the present disclosure. As shown in fig. 2, the scaling factor in the x-axis direction of the image coordinate system of the global image is 2, the scaling factor in the y-axis direction of the image coordinate system of the global image is 1.5, and after scaling, the length of the target frame is 2 times the length of the human frame in the x-axis direction of the image coordinate system of the global image, and the width of the target frame is 1.5 times the width of the human frame in the y-axis direction of the image coordinate system of the global image.

In one possible implementation, the scaling factor includes an expansion factor and a contraction factor. In the case where the scaling factor is greater than or equal to 1, the scaling factor may be referred to as an expansion factor, and expanding the human body frame at this time may result in a target frame, that is, an area of the target frame is greater than or equal to an area of the human body frame. In the case where the scaling factor is less than 1, the scaling factor may be referred to as a contraction factor, and the human body frame may be contracted at this time to obtain the target frame, that is, the area of the target frame is smaller than the area of the human body frame.

In step S13, a first target image is determined according to the target frame corresponding to each human body frame and the first global image.

The first target image may represent an image that is subsequently used for image segmentation. In the embodiment of the disclosure, the first global image may be cut according to the positions of the target frames corresponding to the human frames, so as to obtain the first target image.

In a possible implementation manner, step S13 may include merging target frames corresponding to the human body frames to obtain a merged frame, and obtaining the first target image according to the merged frame and the first global image.

The merging frame represents the merging result of the target frames corresponding to the human frames. The first target image corresponds to a merging frame with the smallest area among merging frames capable of covering target frames corresponding to all human body frames.

Step S13 will be described below in connection with a single person scene and a multi-person scene, respectively.

In a single person scenario, in step S11, a movement parameter corresponding to a human frame may be obtained, and in step S12, a target frame corresponding to a human frame may be obtained, so in step S13, a first target image may be cut out from a first global image according to a position of the human frame in the first global image.

In the multi-person scenario, the movement parameters corresponding to the plurality of human body frames may be obtained in step S11, and the target frames corresponding to the plurality of human body frames may be obtained in step S12, so that the target frames corresponding to the human body frames need to be combined to obtain a combined frame in step S13, and then the first target image is cut out from the first target image according to the position of the combined frame. Considering that too low a proportion of pixels in the first target image of the portrait may lead to foreground segmentation details, the area of the merging frame is not necessarily too large. Therefore, in the embodiment of the present disclosure, the first global image is acquired based on the merging frame with the smallest area among the merging frames capable of covering the target frames corresponding to all the human frames, so as to obtain the first target image. In this way, the possibility that the proportion of the portrait occupied by the pixels in the first target image is too low can be reduced, and the image segmentation effect is improved.

It should be noted that, in a multi-person scenario, it is optional to combine multiple target frames to obtain a combined frame. That is, in the multi-person scene, after the target frames corresponding to the plurality of human frames are obtained, the target images may be obtained based on the respective target frames, and then the respective target images may be subjected to image segmentation, so that the human body segmentation result of the first global image may be obtained.

Fig. 3 shows an exemplary schematic diagram of a target box and a merge box in an embodiment of the disclosure. As shown in fig. 3, three target frames, each corresponding to one human body, are obtained based on the first global image. And after the three target frames are combined, a combined frame is obtained, and the combined frame can cover all the target frames. Taking two merging frames shown in fig. 3 as an example, a merging frame with the smallest area is selected from all the merging frames to perform image clipping, so that a first target image can be obtained.

In the related art, for a multi-person scene, it is necessary to acquire a target image corresponding to each person based on a target frame corresponding to each person, and then perform image segmentation processing on the target images corresponding to the respective persons. Therefore, the processing of pictures in a multi-person scene is multiplied, pressure is caused to the chip for image segmentation processing, the computing power of the chip is insufficient, the processing speed is reduced, and meanwhile, a user cannot operate other functional module modules on the chip in parallel, so that the user experience is greatly reduced.

In the embodiment of the disclosure, all target frames are combined in a multi-person scene, so that one target image is obtained, and then only one target image needs to be subjected to image segmentation processing. Therefore, the image segmentation processing is changed into single image segmentation processing in multiple scenes, so that the resources and time consumed in the image segmentation processing in multiple scenes and single scenes are equivalent, the efficiency is improved, the resources are saved, and the user experience is improved.

In step S14, image segmentation is performed on the first target image, so as to obtain a human body segmentation result of the first global image.

In the embodiment of the disclosure, after the first target image is input into the background segmentation model, a human body segmentation result of the first target image may be obtained, where the human body segmentation result of the first target image indicates whether each pixel point in the first target image is a human body or a non-human body. And according to the human body segmentation result of the first target image and the human body segmentation result of the first target image in the first global image, the human body segmentation result of the first global image indicates whether each pixel point in the first global image is a human body or a non-human body. The background segmentation model may refer to a related technology, and is not described herein, and the background segmentation model may be a neural network model or the like.

In the embodiment of the disclosure, the size of the target image can be adjusted in real time according to the movement parameters of the human body, so that the target image can keep up with the movement of the human body in time, the probability that the human body leaves the target image in the movement process of the human body and the probability that the proportion of pixels occupied by the human body in the target image are too small are reduced, and the image segmentation effect when the human body moves is effectively improved.

Considering that the human body is in a single person scene or a multi-person scene, if a new target image is adopted for image segmentation every time the human body occurs, the situation that the segmentation effect is unstable can be caused. In order to keep the position of the target image relatively stable, in the embodiment of the present disclosure, smoothing processing may be performed on the target image. The specific procedure of the smoothing process is described in detail below.

In one possible implementation manner, the step S14 may include acquiring a second target image corresponding to a second global image, determining a movement amplitude of the first target image relative to the second target image, and performing image segmentation on the first target image to obtain a human body segmentation result of the first global image when the movement amplitude of the first target image relative to the second target image is greater than a first amplitude threshold.

The second global image and the first global image belong to the same video, and the second global image is the previous frame image of the first global image. It is understood that the first global image and the second global image are the same in size, resolution, etc.

The second target image represents a target image adopted when the human body segmentation result of the second global image is acquired. And performing image segmentation processing on the second target image to obtain a human body segmentation result of the second global image. The process of acquiring the second target image may refer to the process of acquiring the first target image (step S11 to step S13), and will not be described herein.

In one possible implementation, the movement amplitude of the first target image relative to the second target image may be determined according to a coordinate difference between a preset position (e.g., a lower left corner vertex, an upper right corner vertex, or a center point, etc.) of the first target image and a preset position of the second target image. In one example, the first target image lower left corner vertex is at (100 ) coordinates in the first global image, the second target image lower left corner vertex is at (200, 100) coordinates in the second global image, and the first target image is moved 100 pixels in magnitude relative to the second target image.

The first amplitude threshold may be set as desired, for example, the first amplitude threshold may be set to 50 pixels or 150 pixels, or the like. When the movement amplitude of the first target image relative to the second target image is larger than the first amplitude threshold, the human body is shown to move in a larger amplitude, and at this time, in order to improve the image segmentation effect, the first target image can be subjected to image segmentation to obtain a human body segmentation result of the first global image.

In a possible implementation manner, the method may further include performing image segmentation by using the second target image under the condition that the movement amplitude of the first target image relative to the second target image is smaller than or equal to the first amplitude threshold value, so as to obtain a human body segmentation result of the first global image.

When the moving amplitude of the first target image relative to the second target image is smaller than or equal to the first amplitude threshold, the moving amplitude of the human body is smaller, and at this time, in order to improve the stability of the image for image segmentation, the second target image can be subjected to image segmentation, so that a human body segmentation result of the first global image can be obtained.

Considering that the relative movement amplitude between the target images corresponding to adjacent global images is small in the case of slower movement of the human body, the target images adopted in image segmentation may not be updated timely. In order to update the target image in time, in the embodiment of the present disclosure, the update process may be performed on the target image. The specific procedure of the update process is described in detail below.

In a possible implementation manner, step S14 may include acquiring a second target image corresponding to a second global image, determining a coverage rate of the first target image with respect to the second target image, and performing image segmentation on the first target image to obtain a human body segmentation result of the first global image if the coverage rate is smaller than a second amplitude threshold.

In one example, a ratio of an overlapping area of the first target image and the second target image to an area of the second target image may be determined as a coverage of the first target image relative to the second target image.

The second amplitude threshold may be set as desired, for example, the second amplitude threshold may be 40% or 50%, or the like. Under the condition that the coverage rate of the first target image relative to the second target image is smaller than a second amplitude threshold, the human body is shown to move in a larger amplitude, and the target image for image segmentation needs to be updated, so that the first target image can be subjected to image segmentation, and the human body segmentation effect of the first global image can be obtained.

In a possible implementation manner, the method may further include performing image segmentation by using the second target image to obtain a human body segmentation result of the first global image when the coverage rate is greater than or equal to the second amplitude threshold.

And under the condition that the coverage rate of the first target image relative to the second target image is larger than or equal to a second amplitude threshold, the coverage rate of the first target image relative to the second target image shows that any movement amplitude is smaller, and at the moment, in order to keep the stability of the portrait, the second target image can be subjected to image segmentation to obtain a human body segmentation result of the first global image. In this way, the target image used when the previous frame image is adopted to carry out image segmentation is used as the target image used when the current frame image is used to carry out image segmentation, so that the target image for carrying out image segmentation is the same image, the corresponding obtained image segmentation result cannot change, the segmented portrait cannot change, shaking of the portrait is avoided, the stability of the portrait is maintained, and the user experience is improved.

The specific process of acquiring the movement parameter corresponding to each human frame in the first global image is described in detail below. In consideration of the fact that the first preset mapping relation for indicating the movement parameters corresponding to the human frames at different distances in different scenes is required to be used in the process, the process of acquiring the first preset mapping relation is described first.

The first preset mapping relation comprises a scene, a distance between a person in the global image and a reference position of the video and a movement parameter. In a possible implementation manner, the method further comprises the steps of obtaining a first video, carrying out limb tracking on a target person in the first video to obtain positions of human frames corresponding to the target person in each frame image of the first video, determining moving speed, moving amplitude and second distance of the target person according to the positions of the human frames corresponding to the target person in each frame image of the first video, obtaining a first moving parameter according to the moving speed and the moving amplitude of the target person, and establishing the first preset mapping relation based on the first scene, the second distance and the first moving parameter.

The first video corresponds to a first scene, which may be a single person scene or a multi-person scene, and is used for recording movement conditions of the single person or the multi-person in a first movement range, wherein the first movement range is a first distance from a second image acquisition device, and the second image acquisition device represents an image acquisition device for acquiring the first video.

In one example, after erection of the second image capturing device, movement may be performed by a person within a first range of movement at a first distance (e.g., 1 meter, 3 meters, or 5 meters, etc.) from the second image capturing device. The second image capturing device may capture a moving video of the person as the first video. And tracking the target person in the first video by using a limb tracking technology, so that the position of the human frame corresponding to the target person in each frame of image of the first video can be obtained. And determining the moving speed, the moving amplitude and the second distance of the target person according to the positions of the human frames corresponding to the target person in each frame of image of the first video, so as to obtain a first moving parameter. And establishing the first preset mapping relation based on the first scene, the second distance and the first movement parameter.

Wherein the second distance is used to indicate a distance of the target person from a reference location of the first video. Specifically, a distance between a preset position (e.g., a lower left corner vertex, an upper right corner vertex, or a center point, etc.) of the target person and a reference position of the first video may be determined as the second distance corresponding to the target person. The reference position of the first video may be a position pre-designated in the first video, such as a lower boundary line of the first video, or an upper boundary line of the first video. The distance between the corresponding target person and the lower boundary line of the first video may be referred to as a second distance, or a distance between the target person and the upper boundary line of the first video. The second distance may be sized to characterize a distance between the target person and the second image capture device. Taking the example that the second distance is used to indicate the distance between the target person and the lower boundary line of the first video, the larger the second distance is, the closer the target person is to the second image capturing device, and the smaller the second distance is, the farther the target person is from the second image capturing device. Taking the example that the second distance is used to indicate the distance between the target person and the upper boundary line of the first video, the larger the second distance is, the farther the target person is from the second image capturing device, and the smaller the second distance is, the farther the target person is from the second image capturing device.

In one possible implementation manner, the method further comprises the steps of setting a moving frequency of a person in a first video to be larger than a first frequency threshold value and a length of projection of the first moving range in the x-axis direction of a camera coordinate system of a second image acquisition device to be larger than a first moving threshold value when the first scene is a single person scene, setting a moving frequency of the person in the first video to be smaller than or equal to a second frequency threshold value and a length of projection of the first moving range in the x-axis direction of the camera coordinate system of the second image acquisition device to be smaller than or equal to a second moving threshold value when the first scene is a multiple person scene, wherein the second frequency threshold value is smaller than or equal to the first frequency threshold value and the second moving threshold value is smaller than or equal to the first moving threshold value. In one example, the first movement range may be rectangular, and the shortest distance between the lower edge of the first movement range and the second image capturing device may be determined as the first distance. In yet another example, the first movement range may be a sector ring (i.e., a portion of a circular ring) centered on the second image capturing device, where an inner radius of the first movement range may be determined as the first distance.

It can be understood that, because the movement of the person in the single person scene is limited less and the movement space is larger, the movement range of the person in the single person scene is larger and the movement range of the person in the multi-person scene is smaller. The smaller the first distance, the larger the movement amplitude and movement speed, and the larger the first distance, the smaller the movement amplitude and movement speed. Therefore, the moving frequency and moving range of the person in the first video set in the single person scene are larger than those in the first video set in the multi-person scene. In addition, the target person in the multi-person scene may be any one or more persons among the plurality of persons.

It should be noted that, the first frequency threshold, the first movement threshold, the second frequency threshold and the second movement threshold may be set according to needs, and only the second frequency threshold is required to be set to be smaller than or equal to the first frequency threshold, and the second movement threshold is required to be smaller than or equal to the first movement threshold.

Thus, a first preset mapping relation is obtained. On this basis, a process of acquiring a movement parameter corresponding to each human frame in the first global image is described.

In a possible implementation manner, the step S11 of obtaining the movement parameters corresponding to each human frame in the first global image may include performing object detection on the first global image to obtain an object detection result of the first global image, determining scene information of the first global image and distance information corresponding to each human frame in the first global image according to the object detection result of the first global image, and determining the movement parameters corresponding to the human frame according to the scene information, the distance information corresponding to the human frame and a first preset mapping relation for each human frame.

The object detection result may be used to indicate a position of a human body frame included in the first global image, the scene information may be used to indicate whether the first global image is a global image in a single person scene or a global image in a multi-person scene, and the distance information may be used to indicate a distance between a human body in the human body frame and the first image capturing device, the first image capturing device representing the image capturing device capturing the first global image.

When the target detection result indicates the position of one human frame, the first global image may be a global image in a single person scene. When the target detection result indicates the positions of the plurality of human frames, it may be determined that the first global image is a global image in a multi-person scene.

In one example, distance information corresponding to a human frame may be determined from a position of the human frame in the first global image. Specifically, the coordinates of the preset position (for example, the lower left corner vertex, the upper right corner vertex, or the center point) of the human frame in the y-axis direction of the first global image may be determined as the distance information corresponding to the human frame. Taking the coordinate system shown in fig. 2 as an example, the smaller the coordinate value of the center point of the human body frame in the y-axis direction of the first global image is, the closer the distance between the human body and the first image acquisition device is, and the larger the coordinate value of the center point of the human body frame in the y-axis direction of the first global image is, the further the distance between the human body and the first image acquisition device is.

In the embodiment of the disclosure, the matched first preset mapping relation can be searched according to the scene information and the distance information corresponding to the human frame, and the moving parameter in the matched first preset mapping relation is determined to be the moving parameter corresponding to the human frame.

It will be appreciated that the above-mentioned method embodiments of the present disclosure may be combined with each other to form a combined embodiment without departing from the principle logic, and are limited to the description of the present disclosure. It will be appreciated by those skilled in the art that in the above-described methods of the embodiments, the particular order of execution of the steps should be determined by their function and possible inherent logic.

In addition, the disclosure further provides an image segmentation apparatus, an electronic device, a computer readable storage medium, and a program, where the foregoing may be used to implement any one of the image segmentation methods provided in the disclosure, and corresponding technical schemes and descriptions and corresponding descriptions referring to method parts are not repeated.

Fig. 4 shows a block diagram of an image segmentation apparatus according to an embodiment of the disclosure. As shown in fig. 4, the apparatus 40 includes:

the first obtaining module 41 is configured to obtain a movement parameter corresponding to each human frame in the first global image, where the movement parameter is used to represent a movement rule or a movement trend corresponding to the human frame;

A scaling module 42, configured to scale, for each human frame, the human frame according to the movement parameter acquired by the first acquiring module 41, to obtain a target frame corresponding to the human frame;

A first determining module 43, configured to determine a first target image according to the target frames corresponding to the human frames obtained by scaling by the scaling module 42 and the first global image;

The first segmentation module 44 is configured to perform image segmentation on the first target image determined by the first determination module 43, so as to obtain a human body segmentation result of the first global image.

In the embodiment of the disclosure, the size of the target image can be adjusted in real time according to the movement rule or movement trend of the human body, so that the target image can keep up with the movement of the human body in time, the probability that the human image leaves the target image and the probability that the proportion of pixels occupied by the human image in the target image is too small are reduced in the movement process of the human body, and the image segmentation effect when the human body moves is effectively improved.

In one possible implementation, the apparatus further includes:

In one possible implementation, the scaling module is further configured to:

In one possible implementation, the apparatus further includes:

The method has specific technical association with the internal structure of the computer system, and can solve the technical problems of improving the hardware operation efficiency or the execution effect (including reducing the data storage amount, reducing the data transmission amount, improving the hardware processing speed and the like), thereby obtaining the technical effect of improving the internal performance of the computer system which accords with the natural law.

In some embodiments, functions or modules included in an apparatus provided by the embodiments of the present disclosure may be used to perform a method described in the foregoing method embodiments, and specific implementations thereof may refer to descriptions of the foregoing method embodiments, which are not repeated herein for brevity.

The disclosed embodiments also provide a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the above-described method. The computer readable storage medium may be a volatile or nonvolatile computer readable storage medium.

The embodiment of the disclosure also provides electronic equipment, which comprises a processor and a memory for storing instructions executable by the processor, wherein the processor is configured to call the instructions stored by the memory so as to execute the method.

Embodiments of the present disclosure also provide a computer program product comprising computer readable code, or a non-transitory computer readable storage medium carrying computer readable code, which when run in a processor of an electronic device, performs the above method.

The electronic device may be provided as a terminal, server or other form of device.

Fig. 5 illustrates a block diagram of an electronic device 800, according to an embodiment of the disclosure. For example, the electronic device 800 may be a User Equipment (UE), a mobile device, a User terminal, a cellular phone, a cordless phone, a Personal digital assistant (Personal DIGITAL ASSISTANT, PDA), a handheld device, a computing device, an in-vehicle device, a wearable device, or the like.

Referring to FIG. 5, the electronic device 800 can include one or more of a processing component 802, a memory 804, a power component 806, a multimedia component 808, an audio component 810, an input/output (I/O) interface 812, a sensor component 814, and a communication component 816.

The processing component 802 generally controls overall operation of the electronic device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 802 may include one or more processors 820 to execute instructions to perform all or part of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interactions between the processing component 802 and other components. For example, the processing component 802 can include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.

The memory 804 is configured to store various types of data to support operations at the electronic device 800. Examples of such data include instructions for any application or method operating on the electronic device 800, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 804 may be implemented by any type or combination of volatile or nonvolatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.

The power supply component 806 provides power to the various components of the electronic device 800. The power components 806 may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for the electronic device 800.

The multimedia component 808 includes a screen between the electronic device 800 and the user that provides an output interface. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user. The touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensor may sense not only the boundary of a touch or slide action, but also the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 808 includes a front camera and/or a rear camera. When the electronic device 800 is in an operational mode, such as a shooting mode or a video mode, the front camera and/or the rear camera may receive external multimedia data. Each front camera and rear camera may be a fixed optical lens system or have focal length and optical zoom capabilities.

The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the electronic device 800 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may be further stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 further includes a speaker for outputting audio signals.

The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be a keyboard, click wheel, buttons, etc. These buttons may include, but are not limited to, a home button, a volume button, an activate button, and a lock button.

The sensor assembly 814 includes one or more sensors for providing status assessment of various aspects of the electronic device 800. For example, the sensor assembly 814 may detect an on/off state of the electronic device 800, a relative positioning of the components, such as a display and keypad of the electronic device 800, the sensor assembly 814 may also detect a change in position of the electronic device 800 or a component of the electronic device 800, the presence or absence of a user's contact with the electronic device 800, an orientation or acceleration/deceleration of the electronic device 800, and a change in temperature of the electronic device 800. The sensor assembly 814 may include a proximity sensor configured to detect the presence of nearby objects without any physical contact. The sensor assembly 814 may also include a photosensor, such as a Complementary Metal Oxide Semiconductor (CMOS) or Charge Coupled Device (CCD) image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscopic sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 816 is configured to facilitate communication between the electronic device 800 and other devices, either wired or wireless. The electronic device 800 may access a wireless network based on a communication standard, such as a wireless network (Wi-Fi), a second generation mobile communication technology (2G), a third generation mobile communication technology (3G), a fourth generation mobile communication technology (4G), long Term Evolution (LTE) of a universal mobile communication technology, a fifth generation mobile communication technology (5G), or a combination thereof. In one exemplary embodiment, the communication component 816 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In one exemplary embodiment, the communication component 816 further includes a Near Field Communication (NFC) module to facilitate short range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the electronic device 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic elements for executing the methods described above.

In an exemplary embodiment, a non-transitory computer readable storage medium is also provided, such as memory 804 including computer program instructions executable by processor 820 of electronic device 800 to perform the above-described methods.

The present disclosure relates to the field of augmented reality, and more particularly, to the field of augmented reality, in which, by acquiring image information of a target object in a real environment, detection or identification processing of relevant features, states and attributes of the target object is further implemented by means of various visual correlation algorithms, so as to obtain an AR effect combining virtual and reality matching with a specific application. By way of example, the target object may relate to a face, limb, gesture, action, etc. associated with a human body, or a marker, a marker associated with an object, or a sand table, display area, or display item associated with a venue or location, etc. Vision related algorithms may involve vision localization, SLAM, three-dimensional reconstruction, image registration, background segmentation, key point extraction and tracking of objects, pose or depth detection of objects, and so forth. The specific application not only can relate to interactive scenes such as navigation, explanation, reconstruction, virtual effect superposition display and the like related to real scenes or articles, but also can relate to interactive scenes such as makeup beautification, limb beautification, special effect display, virtual model display and the like related to people. The detection or identification processing of the relevant characteristics, states and attributes of the target object can be realized through a convolutional neural network. The convolutional neural network is a network model obtained by performing model training based on a deep learning framework.

Fig. 6 illustrates a block diagram of an electronic device 1900 according to an embodiment of the disclosure. For example, electronic device 1900 may be provided as a server or terminal device. Referring to FIG. 6, electronic device 1900 includes a processing component 1922 that further includes one or more processors and memory resources represented by memory 1932 for storing instructions, such as application programs, that can be executed by processing component 1922. The application programs stored in memory 1932 may include one or more modules each corresponding to a set of instructions. Further, processing component 1922 is configured to execute instructions to perform the methods described above.

The electronic device 1900 may also include a power component 1926 configured to perform power management of the electronic device 1900, a wired or wireless network interface 1950 configured to connect the electronic device 1900 to a network, and an input/output (I/O) interface 1958. The electronic device 1900 may operate based on an operating system stored in memory 1932, such as the Microsoft Server operating system (Windows Server ^TM), the apple Inc. promoted graphical user interface-based operating system (Mac OS X ^TM), the multi-user, multi-process computer operating system (Unix ^TM), the free and open source Unix-like operating system (Linux ^TM), the open source Unix-like operating system (FreeBSD ^TM), or the like.

In an exemplary embodiment, a non-transitory computer readable storage medium is also provided, such as memory 1932, including computer program instructions executable by processing component 1922 of electronic device 1900 to perform the methods described above.

The present disclosure may be a system, method, and/or computer program product. The computer program product may include a computer readable storage medium having computer readable program instructions embodied thereon for causing a processor to implement aspects of the present disclosure.

The computer readable storage medium may be a tangible device that can hold and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium include a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical encoding device, punch cards or intra-groove protrusion structures such as those having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media, as used herein, are not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., optical pulses through fiber optic cables), or electrical signals transmitted through wires.

The computer readable program instructions described herein may be downloaded from a computer readable storage medium to a respective computing/processing device or to an external computer or external storage device over a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmissions, wireless transmissions, routers, firewalls, switches, gateway computers and/or edge servers. The network interface card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium in the respective computing/processing device.

The computer program instructions for performing the operations of the present disclosure may be assembly instructions, instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as SMALLTALK, C ++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer readable program instructions may be executed entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, aspects of the present disclosure are implemented by personalizing electronic circuitry, such as programmable logic circuitry, field Programmable Gate Arrays (FPGAs), or Programmable Logic Arrays (PLAs), with state information of computer readable program instructions, which can execute the computer readable program instructions.

Various aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.

These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable medium having the instructions stored therein includes an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The computer program product may be realized in particular by means of hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied as a computer storage medium, and in another alternative embodiment, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK), or the like.

The foregoing description of various embodiments is intended to highlight differences between the various embodiments, which may be the same or similar to each other by reference, and is not repeated herein for the sake of brevity.

It will be appreciated by those skilled in the art that in the above-described method of the specific embodiments, the written order of steps is not meant to imply a strict order of execution but rather should be construed according to the function and possibly inherent logic of the steps.

If the technical scheme of the application relates to personal information, the product applying the technical scheme of the application clearly informs the personal information processing rule before processing the personal information and obtains the autonomous agreement of the individual. If the technical scheme of the application relates to sensitive personal information, the product applying the technical scheme of the application obtains individual consent before processing the sensitive personal information, and simultaneously meets the requirement of 'explicit consent'. For example, a clear and obvious mark is set at a personal information acquisition device such as a camera to inform that the personal information acquisition range is entered, personal information is acquired, if the personal voluntarily enters the acquisition range, the personal information is considered as consent to acquire the personal information, or if a clear mark/information is used on a personal information processing device to inform that the personal information processing rule is used, personal authorization is obtained through popup information or a mode of requesting the personal information to upload the personal information by the personal, wherein the personal information processing rule can comprise information such as a personal information processor, a personal information processing purpose, a processing mode, a processed personal information type and the like.

The foregoing description of the embodiments of the present disclosure has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the various embodiments described. The terminology used herein was chosen in order to best explain the principles of the embodiments, the practical application, or the improvement of technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims

1. An image segmentation method, characterized in that the method comprises:

Obtain the motion parameters corresponding to each human body bounding box in the first global image. The motion parameters are used to represent the motion pattern or motion trend corresponding to the human body bounding box.

For each human body bounding box, the bounding box is scaled according to the movement parameters to obtain the target bounding box corresponding to the human body bounding box;

The first target image is determined based on the target bounding box corresponding to each human body frame and the first global image;

The first target image is segmented to obtain the human body segmentation result of the first global image;

The step of obtaining the motion parameters corresponding to each human bounding box in the first global image includes:

Target detection is performed on the first global image to obtain the target detection result of the first global image. The target detection result is used to indicate the position of the human body bounding box included in the first global image.

Based on the target detection results of the first global image, the scene information of the first global image and the distance information corresponding to each human bounding box in the first global image are determined. The scene information is used to indicate whether the first global image is a global image in a single-person scene or a global image in a multi-person scene. The distance information is used to indicate the distance between the human body in the human bounding box and the first image acquisition device. The first image acquisition device is used to acquire the first global image.

For each human body bounding box, the movement parameters corresponding to the human body bounding box are obtained based on the scene information, the distance information corresponding to the human body bounding box, and the first preset mapping relationship. The first preset mapping relationship is used to indicate the movement parameters corresponding to human body bounding boxes at different distances in different scenes.

2. The method according to claim 1, characterized in that the method further comprises:

Acquire a first video, which corresponds to a first scene, which is a single-person scene or a multi-person scene. The first video is used to record the movement of a single person or multiple people within a first movement range. The first movement range is a first distance away from the second image acquisition device, and the second image acquisition device is used to acquire the first video.

The target person in the first video is tracked to obtain the position of the human bounding box corresponding to the target person in each frame of the first video.

Based on the position of the human body frame corresponding to the target person in each frame of the first video, the movement speed, movement range, and second distance of the target person are determined, and the second distance is used to indicate the distance between the target person and the reference position of the first video.

Based on the target person's movement speed and movement range, obtain the first movement parameter;

Based on the first scenario, the second distance, and the first movement parameters, the first preset mapping relationship is established.

3. The method according to claim 2, characterized in that the method further comprises:

In the case that the first scene is a single-person scene, the movement frequency of the person in the first video is set to be greater than a first frequency threshold, and the length of the projection of the first movement range in the x-axis direction of the camera coordinate system of the second image acquisition device is greater than the first movement threshold.

In the case where the first scenario is a multi-person scenario, the movement frequency of the people in the first video is set to be less than or equal to a second frequency threshold, and the length of the projection of the first movement range in the x-axis direction of the camera coordinate system of the second image acquisition device is less than or equal to the second movement threshold.

Wherein, the second frequency threshold is less than or equal to the first frequency threshold, and the second movement threshold is less than or equal to the first movement threshold.

4. The method according to any one of claims 1 to 3, characterized in that, scaling the human body frame according to the movement parameters corresponding to the human body frame to obtain the target frame corresponding to the human body frame includes:

The scaling factor of the human body frame is determined based on the movement parameters corresponding to the human body frame and the second preset mapping relationship.

The human body frame is scaled according to the scaling factor to obtain the target frame corresponding to the human body frame. The second preset mapping relationship is used to indicate the scaling factor corresponding to different movement parameters.

5. The method according to any one of claims 1 to 3, characterized in that, determining the first target image based on the target bounding box corresponding to each human body bounding box and the first global image includes:

The target boxes corresponding to each human body frame are merged to obtain a merged box;

The first target image is obtained based on the merged frame and the first global image;

The first target image corresponds to the smallest merged frame among the merged frames that can cover all the target frames corresponding to human body frames.

6. The method according to any one of claims 1 to 3, characterized in that, the step of performing image segmentation on the first target image to obtain the human body segmentation result of the first global image includes:

Obtain the second target image corresponding to the second global image. The second global image is the previous frame image of the first global image in the video. The second target image represents the target image used when obtaining the human body segmentation result of the second global image.

Determine the magnitude of the movement of the first target image relative to the second target image;

If the movement of the first target image relative to the second target image is greater than a first amplitude threshold, the first target image is segmented to obtain the human body segmentation result of the first global image.

7. The method according to claim 6, characterized in that the method further comprises:

If the movement of the first target image relative to the second target image is less than or equal to the first amplitude threshold, the second target image is used for image segmentation to obtain the human body segmentation result of the first global image.

8. The method according to any one of claims 1 to 3, characterized in that, the step of performing image segmentation on the first target image to obtain the human body segmentation result of the first global image includes:

Obtain the second target image corresponding to the second global image. The second global image and the first global image belong to the same video, and the second global image is the previous frame image of the first global image. The second target image represents the target image used when obtaining the human body segmentation result of the second global image.

Determine the coverage of the first target image relative to the second target image;

If the coverage is less than the second amplitude threshold, the first target image is segmented to obtain the human body segmentation result of the first global image.

9. The method according to claim 8, characterized in that the method further comprises:

If the coverage is greater than or equal to the second amplitude threshold, the second target image is used for image segmentation to obtain the human body segmentation result of the first global image.

10. An image segmentation apparatus, characterized in that the apparatus comprises:

The first acquisition module is used to acquire the motion parameters corresponding to each human body bounding box in the first global image. The motion parameters are used to represent the motion pattern or motion trend corresponding to the human body bounding box.

The scaling module is used to scale each human body frame according to the movement parameters obtained by the first acquisition module to obtain the target frame corresponding to the human body frame.

The first determining module is used to determine the first target image based on the target bounding boxes corresponding to each human body bounding box obtained by scaling by the scaling module and the first global image;

The first segmentation module is used to perform image segmentation on the first target image determined by the first determining module to obtain the human body segmentation result of the first global image;

The first acquisition module is also used for:

11. An electronic device, characterized in that it comprises:

processor;

Memory used to store processor-executable instructions;

The processor is configured to invoke instructions stored in the memory to execute the method according to any one of claims 1 to 9.

12. A computer-readable storage medium having stored thereon computer program instructions, characterized in that, when executed by a processor, the computer program instructions implement the method described in any one of claims 1 to 9.