[go: up one dir, main page]

CN114638817B - Image segmentation method and device, electronic equipment and storage medium - Google Patents

Image segmentation method and device, electronic equipment and storage medium

Info

Publication number
CN114638817B
CN114638817B CN202210322862.0A CN202210322862A CN114638817B CN 114638817 B CN114638817 B CN 114638817B CN 202210322862 A CN202210322862 A CN 202210322862A CN 114638817 B CN114638817 B CN 114638817B
Authority
CN
China
Prior art keywords
image
human body
target
movement
global
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210322862.0A
Other languages
Chinese (zh)
Other versions
CN114638817A (en
Inventor
陈如婷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sensetime Technology Development Co Ltd
Original Assignee
Beijing Sensetime Technology Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sensetime Technology Development Co Ltd filed Critical Beijing Sensetime Technology Development Co Ltd
Priority to CN202210322862.0A priority Critical patent/CN114638817B/en
Publication of CN114638817A publication Critical patent/CN114638817A/en
Application granted granted Critical
Publication of CN114638817B publication Critical patent/CN114638817B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/194Segmentation; Edge detection involving foreground-background segmentation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20021Dividing image into blocks, subimages or windows
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Image Analysis (AREA)

Abstract

本公开涉及一种图像分割方法及装置、电子设备和存储介质,所述方法包括:获取第一全局图像中每个人体框对应的移动参数,所述移动参数用于表示所述人体框对应的移动规律或移动趋势;针对每个所述人体框,根据移动参数对所述人体框进行缩放,得到所述人体框对应的目标框;根据各个人体框对应的目标框和所述第一全局图像,确定第一目标图像;对所述第一目标图像进行图像分割,得到所述第一全局图像的人体分割结果。本公开实施例提高了人体移动时的图像分割效果。

This disclosure relates to an image segmentation method, apparatus, electronic device, and storage medium. The method includes: acquiring motion parameters corresponding to each human bounding box in a first global image, the motion parameters representing the motion pattern or trend of the human bounding box; scaling each human bounding box according to the motion parameters to obtain a target bounding box corresponding to the human bounding box; determining a first target image based on the target bounding boxes corresponding to each human bounding box and the first global image; and performing image segmentation on the first target image to obtain the human segmentation result of the first global image. Embodiments of this disclosure improve the image segmentation effect when the human body is moving.

Description

Image segmentation method and device, electronic equipment and storage medium
Technical Field
The disclosure relates to the field of computer technology, and in particular, to an image segmentation method and device, an electronic device and a storage medium.
Background
Background segmentation is an important issue in the field of computer vision and smart home. Background segmentation models can be used in many fields. For example, in a home entertainment scene, when multiple persons interact, a portrait and a background can be segmented based on a background segmentation model, so that a monotonous background can be replaced. In background segmentation, a region of interest (e.g., a region including a portrait) is generally selected from a global image as a target image, and then the target image is subjected to image segmentation.
In the image segmentation process, the situation that the human image leaves the target image or the segmentation details of the human image are poor and the like may occur under the influence of the motion of the human body, so that the image segmentation effect is poor. Therefore, how to improve the image segmentation effect during the motion of the human body is a current urgent problem to be solved.
Disclosure of Invention
The disclosure provides an image segmentation method and device, an electronic device and a technical scheme of a storage medium.
According to one aspect of the disclosure, an image segmentation method is provided, which comprises the steps of obtaining a movement parameter corresponding to each human body frame in a first global image, wherein the movement parameter is used for representing a movement rule or a movement trend corresponding to the human body frame, scaling the human body frame according to the movement parameter aiming at each human body frame to obtain a target frame corresponding to the human body frame, determining a first target image according to the target frame corresponding to each human body frame and the first global image, and performing image segmentation on the first target image to obtain a human body segmentation result of the first global image.
The image segmentation method provided by the embodiment of the disclosure can be applied to image segmentation in a single scene and also can be applied to image segmentation in a multi-person scene. In the embodiment of the disclosure, the size of the target image can be adjusted in real time according to the movement rule or movement trend of the human body, so that the target image can keep up with the movement of the human body in time, the probability that the human image leaves the target image and the probability that the proportion of pixels occupied by the human image in the target image is too small are reduced in the movement process of the human body, and the image segmentation effect when the human body moves is effectively improved.
In one possible implementation manner, the obtaining of the movement parameters corresponding to each human frame in the first global image includes performing object detection on the first global image to obtain an object detection result of the first global image, where the object detection result is used to indicate a position of a human frame included in the first global image, determining, according to the object detection result of the first global image, scene information of the first global image and distance information corresponding to each human frame in the first global image, where the scene information is used to indicate whether the first global image is a global image in a single person scene or a global image in multiple person scenes, where the distance information is used to indicate a distance between a human body in the human frame and a first image acquisition device, and where the first image acquisition device is used to acquire, for each human frame, a movement parameter corresponding to the human frame according to the scene information, the distance information corresponding to the human frame and a first preset mapping relation, and where the first preset relation is used to indicate different distances of the human frames.
In the embodiment of the disclosure, the movement parameters corresponding to the human body frames in the first global image are obtained based on the scene information of the first global image and the distance information corresponding to the human body frames in the first global image, so that conditions are provided for scaling of the human body frames, and the image segmentation effect during human body movement is improved.
In one possible implementation manner, the method further comprises the steps of obtaining a first video, wherein the first video corresponds to a first scene, the first scene is a single person scene or a multi-person scene, the first video is used for recording movement conditions of a single person or a plurality of persons in a first movement range, the first movement range is away from a second image acquisition device, the second image acquisition device is used for acquiring the first video, limb tracking is conducted on a target person in the first video, positions of a human body frame corresponding to the target person in each frame image of the first video are obtained, movement speed, movement amplitude and second distance of the target person are determined according to positions of the human body frame corresponding to the target person in each frame image of the first video, the second distance is used for indicating the distance between the target person and a reference position of the first video, a first movement parameter is obtained according to the movement speed and the movement amplitude of the target person, and the first mapping is established based on the first scene, the second distance and the first movement parameter.
In the embodiment of the disclosure, the historical images are analyzed through limb tracking, so that the mapping relation between the positions of the scene and the target person and the movement parameters is obtained, and a basis is provided for determining the movement parameters corresponding to the human frame in the first global image.
In one possible implementation manner, the method further comprises the steps of setting a moving frequency of a person in a first video to be larger than a first frequency threshold value and a length of projection of the first moving range in the x-axis direction of a camera coordinate system of the second image acquisition device to be larger than a first moving threshold value when the first scene is a single person scene, setting a moving frequency of the person in the first video to be smaller than or equal to a second frequency threshold value and a length of projection of the first moving range in the x-axis direction of the camera coordinate system of the second image acquisition device to be smaller than or equal to a second moving threshold value when the first scene is a multiple person scene, wherein the second frequency threshold value is smaller than or equal to the first frequency threshold value and the second moving threshold value is smaller than or equal to the first moving threshold value.
In the embodiment of the disclosure, the movement rule of a person is attached to the actual scene by setting a larger movement range and movement frequency for a single person scene and setting a smaller movement range and movement frequency for a plurality of persons, so that the accuracy of the movement parameters corresponding to the human frame is improved, and the image segmentation effect during movement of the human body is improved.
In one possible implementation manner, the scaling the human body frame according to the movement parameter corresponding to the human body frame to obtain the target frame corresponding to the human body frame includes determining a scaling factor of the human body frame according to the movement parameter corresponding to the human body frame and a second preset mapping relation, and scaling the human body frame according to the scaling factor to obtain the target frame corresponding to the human body frame, wherein the second preset mapping relation is used for indicating scaling factors corresponding to different movement parameters.
In a possible implementation manner, the determining the first target image according to the target frames corresponding to the human frames and the first global image includes merging the target frames corresponding to the human frames to obtain a merged frame, and obtaining the first target image according to the merged frame and the first global image, wherein the first target image corresponds to a merged frame with the smallest area in the merged frames capable of covering the target frames corresponding to all the human frames.
In the embodiment of the disclosure, the first global image is acquired based on the merging frame with the smallest area in the merging frames capable of covering all target frames corresponding to the human body frames, so as to obtain the first target image. In this way, the possibility that the proportion of the portrait occupied by the pixels in the first target image is too low can be reduced, and the image segmentation effect is improved.
In one possible implementation manner, the image segmentation of the first target image to obtain a human body segmentation result of the first global image includes obtaining a second target image corresponding to a second global image, where the second global image is a previous frame image of the first global image in a video, the second target image represents a target image adopted when the human body segmentation result of the second global image is obtained, determining a movement amplitude of the first target image relative to the second target image, and performing image segmentation on the first target image to obtain a human body segmentation result of the first global image when the movement amplitude of the first target image relative to the second target image is greater than a first amplitude threshold.
Therefore, the stability of the segmentation effect is improved by keeping the position of the target image relatively stable.
In one possible implementation manner, the method further comprises performing image segmentation by adopting the second target image under the condition that the movement amplitude of the first target image relative to the second target image is smaller than or equal to the first amplitude threshold value, so as to obtain a human body segmentation result of the first global image.
In one possible implementation manner, the image segmentation is performed on the first target image to obtain a human body segmentation result of the first global image, and the method comprises the steps of obtaining a second target image corresponding to a second global image, wherein the second global image and the first global image belong to the same video, the second global image is a previous frame image of the first global image, the second target image represents a target image adopted when the human body segmentation result of the second global image is obtained, determining coverage rate of the first target image relative to the second target image, and performing image segmentation on the first target image to obtain the human body segmentation result of the first global image under the condition that the coverage rate is smaller than a second amplitude threshold value.
Thus, the target image can be updated in time, and the image segmentation effect is improved.
In one possible implementation, the method further includes:
And under the condition that the coverage rate is larger than or equal to the second amplitude threshold value, performing image segmentation by adopting the second target image to obtain a human body segmentation result of the first global image.
According to an aspect of the present disclosure, there is provided an image segmentation apparatus including:
The first acquisition module is used for acquiring movement parameters corresponding to each human body frame in the first global image, wherein the movement parameters are used for representing movement rules or movement trends corresponding to the human body frames;
The scaling module is used for scaling the human body frames according to the moving parameters acquired by the first acquisition module aiming at each human body frame to acquire target frames corresponding to the human body frames;
the first determining module is used for determining a first target image according to the target frames corresponding to the human frames obtained by scaling by the scaling module and the first global image;
The first segmentation module is used for carrying out image segmentation on the first target image determined by the first determination module to obtain a human body segmentation result of the first global image.
In one possible implementation manner, the first obtaining module is further configured to:
performing target detection on the first global image to obtain a target detection result of the first global image, wherein the target detection result is used for indicating the position of a human frame included in the first global image;
Determining scene information of the first global image and distance information corresponding to each human frame in the first global image according to a target detection result of the first global image, wherein the scene information is used for indicating whether the first global image is a global image in a single person scene or a global image in a multiple person scene, the distance information is used for indicating the distance between a human body in the human frame and a first image acquisition device, and the first image acquisition device is used for acquiring the first global image;
And aiming at each human body frame, obtaining the movement parameters corresponding to the human body frame according to the scene information, the distance information corresponding to the human body frame and a first preset mapping relation, wherein the first preset mapping relation is used for indicating the movement parameters corresponding to the human body frames with different distances under different scenes.
In one possible implementation, the apparatus further includes:
the second acquisition module is used for acquiring a first video, wherein the first video corresponds to a first scene, the first scene is a single person scene or a multi-person scene, the first video is used for recording the movement condition of a single person or a plurality of persons in a first movement range, the first movement range is a first distance away from the second image acquisition equipment, and the second image acquisition equipment is used for acquiring the first video;
The tracking module is used for carrying out limb tracking on the target person in the first video to obtain the position of a human frame corresponding to the target person in each frame of image of the first video;
The second determining module is used for determining the moving speed, the moving amplitude and a second distance of the target person according to the positions of the human frames corresponding to the target person in each frame image of the first video, wherein the second distance is used for indicating the distance between the target person and the reference position of the first video;
the third acquisition module is used for acquiring a first movement parameter according to the movement speed and the movement amplitude of the target person;
the establishing module is configured to establish the first preset mapping relationship based on the first scene, the second distance, and the first movement parameter.
In one possible implementation, the apparatus further includes:
the first setting module is used for setting that the moving frequency of a person in a first video is larger than a first frequency threshold value and the length of projection of the first moving range in the x-axis direction of a camera coordinate system of the second image acquisition device is larger than a first moving threshold value when the first scene is a single person scene;
the second setting module is used for setting that the moving frequency of the person in the first video is smaller than or equal to a second frequency threshold value and the length of the projection of the first moving range in the x-axis direction of the camera coordinate system of the second image acquisition device is smaller than or equal to a second moving threshold value under the condition that the first scene is a multi-person scene;
wherein the second frequency threshold is less than or equal to the first frequency threshold and the second movement threshold is less than or equal to the first movement threshold.
In one possible implementation, the scaling module is further configured to:
determining a scaling factor of the human body frame according to the movement parameter corresponding to the human body frame and a second preset mapping relation;
and scaling the human body frame according to the scaling coefficient to obtain a target frame corresponding to the human body frame, wherein the second preset mapping relation is used for indicating scaling coefficients corresponding to different movement parameters.
In one possible implementation manner, the first determining module is further configured to:
combining the target frames corresponding to the human frames to obtain combined frames;
Obtaining the first target image according to the merging frame and the first global image;
the first target image corresponds to a merging frame with the smallest area in merging frames capable of covering all target frames corresponding to human body frames.
In one possible implementation, the first segmentation module is further configured to:
Acquiring a second target image corresponding to a second global image, wherein the second global image is a previous frame image of the first global image in a video, and the second target image represents a target image adopted when a human body segmentation result of the second global image is acquired;
Determining a movement amplitude of the first target image relative to the second target image;
and under the condition that the moving amplitude of the first target image relative to the second target image is larger than a first amplitude threshold, image segmentation is carried out on the first target image, and a human body segmentation result of the first global image is obtained.
In one possible implementation, the apparatus further includes:
And the second segmentation module is used for carrying out image segmentation by adopting the second target image under the condition that the movement amplitude of the first target image relative to the second target image is smaller than or equal to the first amplitude threshold value, so as to obtain the human body segmentation result of the first global image.
In one possible implementation, the first segmentation module is further configured to:
acquiring a second target image corresponding to a second global image, wherein the second global image and the first global image belong to the same video, the second global image is a previous frame image of the first global image, and the second target image represents a target image adopted when a human body segmentation result of the second global image is acquired;
Determining a coverage of the first target image relative to the second target image;
and under the condition that the coverage rate is smaller than a second amplitude threshold value, performing image segmentation on the first target image to obtain a human body segmentation result of the first global image.
In one possible implementation, the apparatus further includes:
And the third segmentation module is used for carrying out image segmentation by adopting the second target image under the condition that the coverage rate is larger than or equal to the second amplitude threshold value to obtain a human body segmentation result of the first global image.
According to an aspect of the disclosure, there is provided an electronic device comprising a processor, a memory for storing processor-executable instructions, wherein the processor is configured to invoke the instructions stored in the memory to perform the above method.
According to an aspect of the present disclosure, there is provided a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the above-described method.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure. Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the technical aspects of the disclosure.
FIG. 1 illustrates a flow chart of an image segmentation method according to an embodiment of the present disclosure;
FIG. 2 illustrates an exemplary schematic diagram of a human body frame and a target frame in an embodiment of the present disclosure;
FIG. 3 illustrates an exemplary schematic diagram of a target box and merge box in an embodiment of the disclosure;
Fig. 4 shows a block diagram of an image segmentation apparatus according to an embodiment of the disclosure;
fig. 5 illustrates a block diagram of an electronic device 800, according to an embodiment of the disclosure;
Fig. 6 illustrates a block diagram of an electronic device 1900 according to an embodiment of the disclosure.
Detailed Description
Various exemplary embodiments, features and aspects of the disclosure will be described in detail below with reference to the drawings. In the drawings, like reference numbers indicate identical or functionally similar elements. Although various aspects of the embodiments are illustrated in the accompanying drawings, the drawings are not necessarily drawn to scale unless specifically indicated.
The word "exemplary" is used herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.
The term "and/or" is merely an association relationship describing the associated object, and means that three relationships may exist, for example, a and/or B may mean that a exists alone, while a and B exist together, and B exists alone. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of A, B, C, may mean including any one or more elements selected from the group consisting of A, B and C.
Furthermore, numerous specific details are set forth in the following detailed description in order to provide a better understanding of the present disclosure. It will be understood by those skilled in the art that the present disclosure may be practiced without some of these specific details. In some instances, methods, means, elements, and circuits well known to those skilled in the art have not been described in detail in order not to obscure the present disclosure.
In the scenes of home entertainment, singing with wheat, surfing lessons, online meeting and the like, a background matched with the scene needs to be set so as to promote substitution sense, so that the human body and the background are segmented on the global image acquired by image acquisition equipment (such as equipment with photographing or video recording functions, such as cameras, video cameras, mobile phones, tablets and the like). When the segmentation is performed, a target image containing a portrait (i.e. an image corresponding to a human body) is generally selected from a global image, and then the target image is input into a background segmentation model to perform segmentation of the portrait and the background image. The segmentation effect of the target image is related to the proportion of pixels occupied by the portrait in the target image. If the selected target image is too small, the situation that the human image leaves the target image when the human body moves easily occurs, and if the selected target image is too large, the situation that the foreground segmentation detail is poor due to the fact that the proportion of the human image occupied by the pixels in the target image is too low easily occurs.
The image segmentation method provided by the embodiment of the disclosure can be applied to image segmentation in a single scene and also can be applied to image segmentation in a multi-person scene. In the embodiment of the disclosure, the size of the target image can be adjusted in real time according to the movement rule or movement trend of the human body, so that the target image can keep up with the movement of the human body in time, the probability that the human image leaves the target image and the probability that the proportion of pixels occupied by the human image in the target image is too small are reduced in the movement process of the human body, and the image segmentation effect when the human body moves is effectively improved.
Fig. 1 shows a flowchart of an image segmentation method according to an embodiment of the present disclosure. The image segmentation method may be performed by an electronic device such as a terminal device or a server, where the terminal device may be a User Equipment (UE), a mobile device, a User terminal, a cellular phone, a cordless phone, a Personal digital assistant (Personal DIGITAL ASSISTANT, PDA), a handheld device, a computing device, a vehicle-mounted device, a wearable device, etc., and the method may be implemented by a processor invoking computer readable instructions stored in a memory. Or the method may be performed by a server. As shown in fig. 1, the image segmentation method includes:
In step S11, a movement parameter corresponding to each human frame in the first global image is acquired.
A global image is an image that contains a person. The global image may include one person or multiple persons, which is not limited by the embodiments of the present disclosure. The global image may be obtained after the image capturing device captures an image of a person in a certain spatial range, or may be an image frame including a person obtained from a video, or may be obtained by other means, which is not specifically limited in the embodiment of the present disclosure.
The first global image in step S11 may be used to represent the global image currently to be subjected to image segmentation. In the embodiment of the disclosure, after the first global image is input into the target detection model in the related art, the target detection model may output a target detection result of the first global image. The target detection result may be used to indicate a position of a human frame included in the first global image. The target detection model may be a convolutional neural network model, and the structure and the training process of the target detection model are not limited in the embodiment of the disclosure.
It can be understood that the human body contained in the first global image corresponds to the human body frame indicated by the target detection result one by one. Therefore, when a human body is included in the first global image, a movement parameter corresponding to a human body frame may be acquired in step S11. When the first global image includes a plurality of human bodies, in step S11, movement parameters corresponding to a plurality of human body frames may be acquired.
The movement parameters corresponding to the human body frame can be used for representing the movement rule or movement trend corresponding to the human body frame. The movement rule corresponding to the human body frame can be used for reflecting the movement condition of the human body based on the historical image analysis. The movement trend corresponding to the human body frame can be used for reflecting possible future movement conditions of the human body.
In one possible implementation, the movement parameters of the human frame include, but are not limited to, parameters for reflecting the movement of the human body, such as a movement speed, a movement amplitude, and a movement direction of the human frame. The specific process of obtaining the movement parameters corresponding to each human frame will be described in detail later in connection with possible implementation manners of the embodiments of the present disclosure, which will not be described herein.
In step S12, for each human frame, scaling the human frame according to the movement parameter to obtain a target frame corresponding to the human frame.
The movement of the human body relative to the image acquisition device may be broken down into a side-to-side movement and/or a back-and-forth movement. When a human body moves left and right relative to the image acquisition device, if the size of the target image remains unchanged, a situation that the human image leaves the target image may occur. When a human body approaches to the image acquisition device, the situation that the human body leaves the shooting view of the image acquisition device may occur, and if the size of the target image is kept unchanged, the situation that no portrait is contained in the target image or the contained portrait is incomplete may occur. When the human body is far away from the image acquisition equipment, if the size of the target image is kept unchanged, the situation that the proportion of pixels occupied by the portrait in the target image is too small can occur. The movement parameters considering the human body frame may represent the movement of the human body. Therefore, in the embodiment of the disclosure, the human body frame can be scaled according to the movement parameters corresponding to the human body frame, so that the target frame corresponding to the human body frame is obtained, on one hand, the possibility that the human body moves away from the target frame is reduced, and on the other hand, the possibility that the portrait occupies too small of the target frame is reduced.
Scaling the human frame in embodiments of the present disclosure includes contracting the human frame or expanding the human frame. The human body frame is expanded to obtain the target frame, so that the possibility that a human body moves out of the target frame can be reduced, the human body still stays in the target frame even if the human body moves correspondingly in a period of time, and the image segmentation effect is improved.
In one possible implementation manner, the step S12 of scaling the human frame according to the movement parameter corresponding to the human frame to obtain the target frame corresponding to the human frame includes determining a scaling factor of the human frame according to the movement parameter corresponding to the human frame and a second preset mapping relationship, and scaling the human frame according to the scaling factor to obtain the target frame corresponding to the human frame.
The second preset mapping relationship may be used to indicate scaling coefficients corresponding to different movement parameters. The second preset mapping relationship may be set as needed or empirically. For example, the larger the movement amplitude or the larger the movement speed, the larger the corresponding scaling factor. When the moving direction is far from the image acquisition equipment, the scaling coefficient is smaller than 1, and the larger the distance is, the smaller the scaling coefficient is, and when the moving direction is close to the image acquisition equipment, the scaling coefficient is larger than 1, and the larger the distance is, the larger the scaling coefficient is. Taking the moving direction as an example of being far away from the image acquisition equipment, namely, in the process that the human body gradually gets away from the image acquisition equipment, the ratio of the human image in the global image becomes smaller, the human body frame also needs to be correspondingly reduced, and as the human body moves from the near to the far away from the image acquisition equipment, the reduction amplitude of the human body frame also becomes larger, namely, the scaling factor for representing the reduction amplitude of the human body frame becomes smaller. Correspondingly, in the process that the moving direction is close to the image acquisition equipment, the occupation ratio of a human body in the global image is increased, the human body frame is enlarged, and the scaling factor is increased.
The image coordinate system of the global image takes the center of the global image as the origin of coordinates, the x-axis of the image coordinate system is parallel to the upper side and the lower side of the global image, and the y-axis of the image coordinate system is parallel to the left side and the right side of the global image. In one possible implementation, the scaling factor includes a scaling factor in an x-axis direction of an image coordinate system of the global image and a scaling factor in a y-axis direction of the image coordinate system of the global image. For example, in a single person scene, the human body moves more left and right, and moves less back and forth, that is, the human body frame moves more in the x-axis direction of the image coordinate system of the global image and less in the y-axis direction of the image coordinate system of the global image, so the scaling factor in the x-axis direction of the image coordinate system of the global image provided for the human body frame is larger and the scaling factor in the y-axis direction is smaller. In one example, the person is laterally jumping over the image capturing device when the scaling factor in the x-axis direction of the image coordinate system of the global image is 1.2 and the scaling factor in the y-axis direction is 1.0.
Fig. 2 illustrates an exemplary schematic diagram of a human body frame and a target frame in an embodiment of the present disclosure. As shown in fig. 2, the scaling factor in the x-axis direction of the image coordinate system of the global image is 2, the scaling factor in the y-axis direction of the image coordinate system of the global image is 1.5, and after scaling, the length of the target frame is 2 times the length of the human frame in the x-axis direction of the image coordinate system of the global image, and the width of the target frame is 1.5 times the width of the human frame in the y-axis direction of the image coordinate system of the global image.
In one possible implementation, the scaling factor includes an expansion factor and a contraction factor. In the case where the scaling factor is greater than or equal to 1, the scaling factor may be referred to as an expansion factor, and expanding the human body frame at this time may result in a target frame, that is, an area of the target frame is greater than or equal to an area of the human body frame. In the case where the scaling factor is less than 1, the scaling factor may be referred to as a contraction factor, and the human body frame may be contracted at this time to obtain the target frame, that is, the area of the target frame is smaller than the area of the human body frame.
In step S13, a first target image is determined according to the target frame corresponding to each human body frame and the first global image.
The first target image may represent an image that is subsequently used for image segmentation. In the embodiment of the disclosure, the first global image may be cut according to the positions of the target frames corresponding to the human frames, so as to obtain the first target image.
In a possible implementation manner, step S13 may include merging target frames corresponding to the human body frames to obtain a merged frame, and obtaining the first target image according to the merged frame and the first global image.
The merging frame represents the merging result of the target frames corresponding to the human frames. The first target image corresponds to a merging frame with the smallest area among merging frames capable of covering target frames corresponding to all human body frames.
Step S13 will be described below in connection with a single person scene and a multi-person scene, respectively.
In a single person scenario, in step S11, a movement parameter corresponding to a human frame may be obtained, and in step S12, a target frame corresponding to a human frame may be obtained, so in step S13, a first target image may be cut out from a first global image according to a position of the human frame in the first global image.
In the multi-person scenario, the movement parameters corresponding to the plurality of human body frames may be obtained in step S11, and the target frames corresponding to the plurality of human body frames may be obtained in step S12, so that the target frames corresponding to the human body frames need to be combined to obtain a combined frame in step S13, and then the first target image is cut out from the first target image according to the position of the combined frame. Considering that too low a proportion of pixels in the first target image of the portrait may lead to foreground segmentation details, the area of the merging frame is not necessarily too large. Therefore, in the embodiment of the present disclosure, the first global image is acquired based on the merging frame with the smallest area among the merging frames capable of covering the target frames corresponding to all the human frames, so as to obtain the first target image. In this way, the possibility that the proportion of the portrait occupied by the pixels in the first target image is too low can be reduced, and the image segmentation effect is improved.
It should be noted that, in a multi-person scenario, it is optional to combine multiple target frames to obtain a combined frame. That is, in the multi-person scene, after the target frames corresponding to the plurality of human frames are obtained, the target images may be obtained based on the respective target frames, and then the respective target images may be subjected to image segmentation, so that the human body segmentation result of the first global image may be obtained.
Fig. 3 shows an exemplary schematic diagram of a target box and a merge box in an embodiment of the disclosure. As shown in fig. 3, three target frames, each corresponding to one human body, are obtained based on the first global image. And after the three target frames are combined, a combined frame is obtained, and the combined frame can cover all the target frames. Taking two merging frames shown in fig. 3 as an example, a merging frame with the smallest area is selected from all the merging frames to perform image clipping, so that a first target image can be obtained.
In the related art, for a multi-person scene, it is necessary to acquire a target image corresponding to each person based on a target frame corresponding to each person, and then perform image segmentation processing on the target images corresponding to the respective persons. Therefore, the processing of pictures in a multi-person scene is multiplied, pressure is caused to the chip for image segmentation processing, the computing power of the chip is insufficient, the processing speed is reduced, and meanwhile, a user cannot operate other functional module modules on the chip in parallel, so that the user experience is greatly reduced.
In the embodiment of the disclosure, all target frames are combined in a multi-person scene, so that one target image is obtained, and then only one target image needs to be subjected to image segmentation processing. Therefore, the image segmentation processing is changed into single image segmentation processing in multiple scenes, so that the resources and time consumed in the image segmentation processing in multiple scenes and single scenes are equivalent, the efficiency is improved, the resources are saved, and the user experience is improved.
In step S14, image segmentation is performed on the first target image, so as to obtain a human body segmentation result of the first global image.
In the embodiment of the disclosure, after the first target image is input into the background segmentation model, a human body segmentation result of the first target image may be obtained, where the human body segmentation result of the first target image indicates whether each pixel point in the first target image is a human body or a non-human body. And according to the human body segmentation result of the first target image and the human body segmentation result of the first target image in the first global image, the human body segmentation result of the first global image indicates whether each pixel point in the first global image is a human body or a non-human body. The background segmentation model may refer to a related technology, and is not described herein, and the background segmentation model may be a neural network model or the like.
In the embodiment of the disclosure, the size of the target image can be adjusted in real time according to the movement parameters of the human body, so that the target image can keep up with the movement of the human body in time, the probability that the human body leaves the target image in the movement process of the human body and the probability that the proportion of pixels occupied by the human body in the target image are too small are reduced, and the image segmentation effect when the human body moves is effectively improved.
Considering that the human body is in a single person scene or a multi-person scene, if a new target image is adopted for image segmentation every time the human body occurs, the situation that the segmentation effect is unstable can be caused. In order to keep the position of the target image relatively stable, in the embodiment of the present disclosure, smoothing processing may be performed on the target image. The specific procedure of the smoothing process is described in detail below.
In one possible implementation manner, the step S14 may include acquiring a second target image corresponding to a second global image, determining a movement amplitude of the first target image relative to the second target image, and performing image segmentation on the first target image to obtain a human body segmentation result of the first global image when the movement amplitude of the first target image relative to the second target image is greater than a first amplitude threshold.
The second global image and the first global image belong to the same video, and the second global image is the previous frame image of the first global image. It is understood that the first global image and the second global image are the same in size, resolution, etc.
The second target image represents a target image adopted when the human body segmentation result of the second global image is acquired. And performing image segmentation processing on the second target image to obtain a human body segmentation result of the second global image. The process of acquiring the second target image may refer to the process of acquiring the first target image (step S11 to step S13), and will not be described herein.
In one possible implementation, the movement amplitude of the first target image relative to the second target image may be determined according to a coordinate difference between a preset position (e.g., a lower left corner vertex, an upper right corner vertex, or a center point, etc.) of the first target image and a preset position of the second target image. In one example, the first target image lower left corner vertex is at (100 ) coordinates in the first global image, the second target image lower left corner vertex is at (200, 100) coordinates in the second global image, and the first target image is moved 100 pixels in magnitude relative to the second target image.
The first amplitude threshold may be set as desired, for example, the first amplitude threshold may be set to 50 pixels or 150 pixels, or the like. When the movement amplitude of the first target image relative to the second target image is larger than the first amplitude threshold, the human body is shown to move in a larger amplitude, and at this time, in order to improve the image segmentation effect, the first target image can be subjected to image segmentation to obtain a human body segmentation result of the first global image.
In a possible implementation manner, the method may further include performing image segmentation by using the second target image under the condition that the movement amplitude of the first target image relative to the second target image is smaller than or equal to the first amplitude threshold value, so as to obtain a human body segmentation result of the first global image.
When the moving amplitude of the first target image relative to the second target image is smaller than or equal to the first amplitude threshold, the moving amplitude of the human body is smaller, and at this time, in order to improve the stability of the image for image segmentation, the second target image can be subjected to image segmentation, so that a human body segmentation result of the first global image can be obtained.
Considering that the relative movement amplitude between the target images corresponding to adjacent global images is small in the case of slower movement of the human body, the target images adopted in image segmentation may not be updated timely. In order to update the target image in time, in the embodiment of the present disclosure, the update process may be performed on the target image. The specific procedure of the update process is described in detail below.
In a possible implementation manner, step S14 may include acquiring a second target image corresponding to a second global image, determining a coverage rate of the first target image with respect to the second target image, and performing image segmentation on the first target image to obtain a human body segmentation result of the first global image if the coverage rate is smaller than a second amplitude threshold.
In one example, a ratio of an overlapping area of the first target image and the second target image to an area of the second target image may be determined as a coverage of the first target image relative to the second target image.
The second amplitude threshold may be set as desired, for example, the second amplitude threshold may be 40% or 50%, or the like. Under the condition that the coverage rate of the first target image relative to the second target image is smaller than a second amplitude threshold, the human body is shown to move in a larger amplitude, and the target image for image segmentation needs to be updated, so that the first target image can be subjected to image segmentation, and the human body segmentation effect of the first global image can be obtained.
In a possible implementation manner, the method may further include performing image segmentation by using the second target image to obtain a human body segmentation result of the first global image when the coverage rate is greater than or equal to the second amplitude threshold.
And under the condition that the coverage rate of the first target image relative to the second target image is larger than or equal to a second amplitude threshold, the coverage rate of the first target image relative to the second target image shows that any movement amplitude is smaller, and at the moment, in order to keep the stability of the portrait, the second target image can be subjected to image segmentation to obtain a human body segmentation result of the first global image. In this way, the target image used when the previous frame image is adopted to carry out image segmentation is used as the target image used when the current frame image is used to carry out image segmentation, so that the target image for carrying out image segmentation is the same image, the corresponding obtained image segmentation result cannot change, the segmented portrait cannot change, shaking of the portrait is avoided, the stability of the portrait is maintained, and the user experience is improved.
The specific process of acquiring the movement parameter corresponding to each human frame in the first global image is described in detail below. In consideration of the fact that the first preset mapping relation for indicating the movement parameters corresponding to the human frames at different distances in different scenes is required to be used in the process, the process of acquiring the first preset mapping relation is described first.
The first preset mapping relation comprises a scene, a distance between a person in the global image and a reference position of the video and a movement parameter. In a possible implementation manner, the method further comprises the steps of obtaining a first video, carrying out limb tracking on a target person in the first video to obtain positions of human frames corresponding to the target person in each frame image of the first video, determining moving speed, moving amplitude and second distance of the target person according to the positions of the human frames corresponding to the target person in each frame image of the first video, obtaining a first moving parameter according to the moving speed and the moving amplitude of the target person, and establishing the first preset mapping relation based on the first scene, the second distance and the first moving parameter.
The first video corresponds to a first scene, which may be a single person scene or a multi-person scene, and is used for recording movement conditions of the single person or the multi-person in a first movement range, wherein the first movement range is a first distance from a second image acquisition device, and the second image acquisition device represents an image acquisition device for acquiring the first video.
In one example, after erection of the second image capturing device, movement may be performed by a person within a first range of movement at a first distance (e.g., 1 meter, 3 meters, or 5 meters, etc.) from the second image capturing device. The second image capturing device may capture a moving video of the person as the first video. And tracking the target person in the first video by using a limb tracking technology, so that the position of the human frame corresponding to the target person in each frame of image of the first video can be obtained. And determining the moving speed, the moving amplitude and the second distance of the target person according to the positions of the human frames corresponding to the target person in each frame of image of the first video, so as to obtain a first moving parameter. And establishing the first preset mapping relation based on the first scene, the second distance and the first movement parameter.
Wherein the second distance is used to indicate a distance of the target person from a reference location of the first video. Specifically, a distance between a preset position (e.g., a lower left corner vertex, an upper right corner vertex, or a center point, etc.) of the target person and a reference position of the first video may be determined as the second distance corresponding to the target person. The reference position of the first video may be a position pre-designated in the first video, such as a lower boundary line of the first video, or an upper boundary line of the first video. The distance between the corresponding target person and the lower boundary line of the first video may be referred to as a second distance, or a distance between the target person and the upper boundary line of the first video. The second distance may be sized to characterize a distance between the target person and the second image capture device. Taking the example that the second distance is used to indicate the distance between the target person and the lower boundary line of the first video, the larger the second distance is, the closer the target person is to the second image capturing device, and the smaller the second distance is, the farther the target person is from the second image capturing device. Taking the example that the second distance is used to indicate the distance between the target person and the upper boundary line of the first video, the larger the second distance is, the farther the target person is from the second image capturing device, and the smaller the second distance is, the farther the target person is from the second image capturing device.
In one possible implementation manner, the method further comprises the steps of setting a moving frequency of a person in a first video to be larger than a first frequency threshold value and a length of projection of the first moving range in the x-axis direction of a camera coordinate system of a second image acquisition device to be larger than a first moving threshold value when the first scene is a single person scene, setting a moving frequency of the person in the first video to be smaller than or equal to a second frequency threshold value and a length of projection of the first moving range in the x-axis direction of the camera coordinate system of the second image acquisition device to be smaller than or equal to a second moving threshold value when the first scene is a multiple person scene, wherein the second frequency threshold value is smaller than or equal to the first frequency threshold value and the second moving threshold value is smaller than or equal to the first moving threshold value. In one example, the first movement range may be rectangular, and the shortest distance between the lower edge of the first movement range and the second image capturing device may be determined as the first distance. In yet another example, the first movement range may be a sector ring (i.e., a portion of a circular ring) centered on the second image capturing device, where an inner radius of the first movement range may be determined as the first distance.
It can be understood that, because the movement of the person in the single person scene is limited less and the movement space is larger, the movement range of the person in the single person scene is larger and the movement range of the person in the multi-person scene is smaller. The smaller the first distance, the larger the movement amplitude and movement speed, and the larger the first distance, the smaller the movement amplitude and movement speed. Therefore, the moving frequency and moving range of the person in the first video set in the single person scene are larger than those in the first video set in the multi-person scene. In addition, the target person in the multi-person scene may be any one or more persons among the plurality of persons.
It should be noted that, the first frequency threshold, the first movement threshold, the second frequency threshold and the second movement threshold may be set according to needs, and only the second frequency threshold is required to be set to be smaller than or equal to the first frequency threshold, and the second movement threshold is required to be smaller than or equal to the first movement threshold.
Thus, a first preset mapping relation is obtained. On this basis, a process of acquiring a movement parameter corresponding to each human frame in the first global image is described.
In a possible implementation manner, the step S11 of obtaining the movement parameters corresponding to each human frame in the first global image may include performing object detection on the first global image to obtain an object detection result of the first global image, determining scene information of the first global image and distance information corresponding to each human frame in the first global image according to the object detection result of the first global image, and determining the movement parameters corresponding to the human frame according to the scene information, the distance information corresponding to the human frame and a first preset mapping relation for each human frame.
The object detection result may be used to indicate a position of a human body frame included in the first global image, the scene information may be used to indicate whether the first global image is a global image in a single person scene or a global image in a multi-person scene, and the distance information may be used to indicate a distance between a human body in the human body frame and the first image capturing device, the first image capturing device representing the image capturing device capturing the first global image.
When the target detection result indicates the position of one human frame, the first global image may be a global image in a single person scene. When the target detection result indicates the positions of the plurality of human frames, it may be determined that the first global image is a global image in a multi-person scene.
In one example, distance information corresponding to a human frame may be determined from a position of the human frame in the first global image. Specifically, the coordinates of the preset position (for example, the lower left corner vertex, the upper right corner vertex, or the center point) of the human frame in the y-axis direction of the first global image may be determined as the distance information corresponding to the human frame. Taking the coordinate system shown in fig. 2 as an example, the smaller the coordinate value of the center point of the human body frame in the y-axis direction of the first global image is, the closer the distance between the human body and the first image acquisition device is, and the larger the coordinate value of the center point of the human body frame in the y-axis direction of the first global image is, the further the distance between the human body and the first image acquisition device is.
In the embodiment of the disclosure, the matched first preset mapping relation can be searched according to the scene information and the distance information corresponding to the human frame, and the moving parameter in the matched first preset mapping relation is determined to be the moving parameter corresponding to the human frame.
It will be appreciated that the above-mentioned method embodiments of the present disclosure may be combined with each other to form a combined embodiment without departing from the principle logic, and are limited to the description of the present disclosure. It will be appreciated by those skilled in the art that in the above-described methods of the embodiments, the particular order of execution of the steps should be determined by their function and possible inherent logic.
In addition, the disclosure further provides an image segmentation apparatus, an electronic device, a computer readable storage medium, and a program, where the foregoing may be used to implement any one of the image segmentation methods provided in the disclosure, and corresponding technical schemes and descriptions and corresponding descriptions referring to method parts are not repeated.
Fig. 4 shows a block diagram of an image segmentation apparatus according to an embodiment of the disclosure. As shown in fig. 4, the apparatus 40 includes:
the first obtaining module 41 is configured to obtain a movement parameter corresponding to each human frame in the first global image, where the movement parameter is used to represent a movement rule or a movement trend corresponding to the human frame;
A scaling module 42, configured to scale, for each human frame, the human frame according to the movement parameter acquired by the first acquiring module 41, to obtain a target frame corresponding to the human frame;
A first determining module 43, configured to determine a first target image according to the target frames corresponding to the human frames obtained by scaling by the scaling module 42 and the first global image;
The first segmentation module 44 is configured to perform image segmentation on the first target image determined by the first determination module 43, so as to obtain a human body segmentation result of the first global image.
In the embodiment of the disclosure, the size of the target image can be adjusted in real time according to the movement rule or movement trend of the human body, so that the target image can keep up with the movement of the human body in time, the probability that the human image leaves the target image and the probability that the proportion of pixels occupied by the human image in the target image is too small are reduced in the movement process of the human body, and the image segmentation effect when the human body moves is effectively improved.
In one possible implementation manner, the first obtaining module is further configured to:
performing target detection on the first global image to obtain a target detection result of the first global image, wherein the target detection result is used for indicating the position of a human frame included in the first global image;
Determining scene information of the first global image and distance information corresponding to each human frame in the first global image according to a target detection result of the first global image, wherein the scene information is used for indicating whether the first global image is a global image in a single person scene or a global image in a multiple person scene, the distance information is used for indicating the distance between a human body in the human frame and a first image acquisition device, and the first image acquisition device is used for acquiring the first global image;
And aiming at each human body frame, obtaining the movement parameters corresponding to the human body frame according to the scene information, the distance information corresponding to the human body frame and a first preset mapping relation, wherein the first preset mapping relation is used for indicating the movement parameters corresponding to the human body frames with different distances under different scenes.
In one possible implementation, the apparatus further includes:
the second acquisition module is used for acquiring a first video, wherein the first video corresponds to a first scene, the first scene is a single person scene or a multi-person scene, the first video is used for recording the movement condition of a single person or a plurality of persons in a first movement range, the first movement range is a first distance away from the second image acquisition equipment, and the second image acquisition equipment is used for acquiring the first video;
The tracking module is used for carrying out limb tracking on the target person in the first video to obtain the position of a human frame corresponding to the target person in each frame of image of the first video;
The second determining module is used for determining the moving speed, the moving amplitude and a second distance of the target person according to the positions of the human frames corresponding to the target person in each frame image of the first video, wherein the second distance is used for indicating the distance between the target person and the reference position of the first video;
the third acquisition module is used for acquiring a first movement parameter according to the movement speed and the movement amplitude of the target person;
the establishing module is configured to establish the first preset mapping relationship based on the first scene, the second distance, and the first movement parameter.
In one possible implementation, the apparatus further includes:
the first setting module is used for setting that the moving frequency of a person in a first video is larger than a first frequency threshold value and the length of projection of the first moving range in the x-axis direction of a camera coordinate system of the second image acquisition device is larger than a first moving threshold value when the first scene is a single person scene;
the second setting module is used for setting that the moving frequency of the person in the first video is smaller than or equal to a second frequency threshold value and the length of the projection of the first moving range in the x-axis direction of the camera coordinate system of the second image acquisition device is smaller than or equal to a second moving threshold value under the condition that the first scene is a multi-person scene;
wherein the second frequency threshold is less than or equal to the first frequency threshold and the second movement threshold is less than or equal to the first movement threshold.
In one possible implementation, the scaling module is further configured to:
determining a scaling factor of the human body frame according to the movement parameter corresponding to the human body frame and a second preset mapping relation;
and scaling the human body frame according to the scaling coefficient to obtain a target frame corresponding to the human body frame, wherein the second preset mapping relation is used for indicating scaling coefficients corresponding to different movement parameters.
In one possible implementation manner, the first determining module is further configured to:
combining the target frames corresponding to the human frames to obtain combined frames;
Obtaining the first target image according to the merging frame and the first global image;
the first target image corresponds to a merging frame with the smallest area in merging frames capable of covering all target frames corresponding to human body frames.
In one possible implementation, the first segmentation module is further configured to:
Acquiring a second target image corresponding to a second global image, wherein the second global image is a previous frame image of the first global image in a video, and the second target image represents a target image adopted when a human body segmentation result of the second global image is acquired;
Determining a movement amplitude of the first target image relative to the second target image;
and under the condition that the moving amplitude of the first target image relative to the second target image is larger than a first amplitude threshold, image segmentation is carried out on the first target image, and a human body segmentation result of the first global image is obtained.
In one possible implementation, the apparatus further includes:
And the second segmentation module is used for carrying out image segmentation by adopting the second target image under the condition that the movement amplitude of the first target image relative to the second target image is smaller than or equal to the first amplitude threshold value, so as to obtain the human body segmentation result of the first global image.
In one possible implementation, the first segmentation module is further configured to:
acquiring a second target image corresponding to a second global image, wherein the second global image and the first global image belong to the same video, the second global image is a previous frame image of the first global image, and the second target image represents a target image adopted when a human body segmentation result of the second global image is acquired;
Determining a coverage of the first target image relative to the second target image;
and under the condition that the coverage rate is smaller than a second amplitude threshold value, performing image segmentation on the first target image to obtain a human body segmentation result of the first global image.
In one possible implementation, the apparatus further includes:
And the third segmentation module is used for carrying out image segmentation by adopting the second target image under the condition that the coverage rate is larger than or equal to the second amplitude threshold value to obtain a human body segmentation result of the first global image.
The method has specific technical association with the internal structure of the computer system, and can solve the technical problems of improving the hardware operation efficiency or the execution effect (including reducing the data storage amount, reducing the data transmission amount, improving the hardware processing speed and the like), thereby obtaining the technical effect of improving the internal performance of the computer system which accords with the natural law.
In some embodiments, functions or modules included in an apparatus provided by the embodiments of the present disclosure may be used to perform a method described in the foregoing method embodiments, and specific implementations thereof may refer to descriptions of the foregoing method embodiments, which are not repeated herein for brevity.
The disclosed embodiments also provide a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the above-described method. The computer readable storage medium may be a volatile or nonvolatile computer readable storage medium.
The embodiment of the disclosure also provides electronic equipment, which comprises a processor and a memory for storing instructions executable by the processor, wherein the processor is configured to call the instructions stored by the memory so as to execute the method.
Embodiments of the present disclosure also provide a computer program product comprising computer readable code, or a non-transitory computer readable storage medium carrying computer readable code, which when run in a processor of an electronic device, performs the above method.
The electronic device may be provided as a terminal, server or other form of device.
Fig. 5 illustrates a block diagram of an electronic device 800, according to an embodiment of the disclosure. For example, the electronic device 800 may be a User Equipment (UE), a mobile device, a User terminal, a cellular phone, a cordless phone, a Personal digital assistant (Personal DIGITAL ASSISTANT, PDA), a handheld device, a computing device, an in-vehicle device, a wearable device, or the like.
Referring to FIG. 5, the electronic device 800 can include one or more of a processing component 802, a memory 804, a power component 806, a multimedia component 808, an audio component 810, an input/output (I/O) interface 812, a sensor component 814, and a communication component 816.
The processing component 802 generally controls overall operation of the electronic device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 802 may include one or more processors 820 to execute instructions to perform all or part of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interactions between the processing component 802 and other components. For example, the processing component 802 can include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.
The memory 804 is configured to store various types of data to support operations at the electronic device 800. Examples of such data include instructions for any application or method operating on the electronic device 800, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 804 may be implemented by any type or combination of volatile or nonvolatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.
The power supply component 806 provides power to the various components of the electronic device 800. The power components 806 may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for the electronic device 800.
The multimedia component 808 includes a screen between the electronic device 800 and the user that provides an output interface. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user. The touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensor may sense not only the boundary of a touch or slide action, but also the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 808 includes a front camera and/or a rear camera. When the electronic device 800 is in an operational mode, such as a shooting mode or a video mode, the front camera and/or the rear camera may receive external multimedia data. Each front camera and rear camera may be a fixed optical lens system or have focal length and optical zoom capabilities.
The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the electronic device 800 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may be further stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 further includes a speaker for outputting audio signals.
The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be a keyboard, click wheel, buttons, etc. These buttons may include, but are not limited to, a home button, a volume button, an activate button, and a lock button.
The sensor assembly 814 includes one or more sensors for providing status assessment of various aspects of the electronic device 800. For example, the sensor assembly 814 may detect an on/off state of the electronic device 800, a relative positioning of the components, such as a display and keypad of the electronic device 800, the sensor assembly 814 may also detect a change in position of the electronic device 800 or a component of the electronic device 800, the presence or absence of a user's contact with the electronic device 800, an orientation or acceleration/deceleration of the electronic device 800, and a change in temperature of the electronic device 800. The sensor assembly 814 may include a proximity sensor configured to detect the presence of nearby objects without any physical contact. The sensor assembly 814 may also include a photosensor, such as a Complementary Metal Oxide Semiconductor (CMOS) or Charge Coupled Device (CCD) image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscopic sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
The communication component 816 is configured to facilitate communication between the electronic device 800 and other devices, either wired or wireless. The electronic device 800 may access a wireless network based on a communication standard, such as a wireless network (Wi-Fi), a second generation mobile communication technology (2G), a third generation mobile communication technology (3G), a fourth generation mobile communication technology (4G), long Term Evolution (LTE) of a universal mobile communication technology, a fifth generation mobile communication technology (5G), or a combination thereof. In one exemplary embodiment, the communication component 816 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In one exemplary embodiment, the communication component 816 further includes a Near Field Communication (NFC) module to facilitate short range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies.
In an exemplary embodiment, the electronic device 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic elements for executing the methods described above.
In an exemplary embodiment, a non-transitory computer readable storage medium is also provided, such as memory 804 including computer program instructions executable by processor 820 of electronic device 800 to perform the above-described methods.
The present disclosure relates to the field of augmented reality, and more particularly, to the field of augmented reality, in which, by acquiring image information of a target object in a real environment, detection or identification processing of relevant features, states and attributes of the target object is further implemented by means of various visual correlation algorithms, so as to obtain an AR effect combining virtual and reality matching with a specific application. By way of example, the target object may relate to a face, limb, gesture, action, etc. associated with a human body, or a marker, a marker associated with an object, or a sand table, display area, or display item associated with a venue or location, etc. Vision related algorithms may involve vision localization, SLAM, three-dimensional reconstruction, image registration, background segmentation, key point extraction and tracking of objects, pose or depth detection of objects, and so forth. The specific application not only can relate to interactive scenes such as navigation, explanation, reconstruction, virtual effect superposition display and the like related to real scenes or articles, but also can relate to interactive scenes such as makeup beautification, limb beautification, special effect display, virtual model display and the like related to people. The detection or identification processing of the relevant characteristics, states and attributes of the target object can be realized through a convolutional neural network. The convolutional neural network is a network model obtained by performing model training based on a deep learning framework.
Fig. 6 illustrates a block diagram of an electronic device 1900 according to an embodiment of the disclosure. For example, electronic device 1900 may be provided as a server or terminal device. Referring to FIG. 6, electronic device 1900 includes a processing component 1922 that further includes one or more processors and memory resources represented by memory 1932 for storing instructions, such as application programs, that can be executed by processing component 1922. The application programs stored in memory 1932 may include one or more modules each corresponding to a set of instructions. Further, processing component 1922 is configured to execute instructions to perform the methods described above.
The electronic device 1900 may also include a power component 1926 configured to perform power management of the electronic device 1900, a wired or wireless network interface 1950 configured to connect the electronic device 1900 to a network, and an input/output (I/O) interface 1958. The electronic device 1900 may operate based on an operating system stored in memory 1932, such as the Microsoft Server operating system (Windows Server TM), the apple Inc. promoted graphical user interface-based operating system (Mac OS X TM), the multi-user, multi-process computer operating system (Unix TM), the free and open source Unix-like operating system (Linux TM), the open source Unix-like operating system (FreeBSD TM), or the like.
In an exemplary embodiment, a non-transitory computer readable storage medium is also provided, such as memory 1932, including computer program instructions executable by processing component 1922 of electronic device 1900 to perform the methods described above.
The present disclosure may be a system, method, and/or computer program product. The computer program product may include a computer readable storage medium having computer readable program instructions embodied thereon for causing a processor to implement aspects of the present disclosure.
The computer readable storage medium may be a tangible device that can hold and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium include a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical encoding device, punch cards or intra-groove protrusion structures such as those having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media, as used herein, are not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., optical pulses through fiber optic cables), or electrical signals transmitted through wires.
The computer readable program instructions described herein may be downloaded from a computer readable storage medium to a respective computing/processing device or to an external computer or external storage device over a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmissions, wireless transmissions, routers, firewalls, switches, gateway computers and/or edge servers. The network interface card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium in the respective computing/processing device.
The computer program instructions for performing the operations of the present disclosure may be assembly instructions, instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as SMALLTALK, C ++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer readable program instructions may be executed entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, aspects of the present disclosure are implemented by personalizing electronic circuitry, such as programmable logic circuitry, field Programmable Gate Arrays (FPGAs), or Programmable Logic Arrays (PLAs), with state information of computer readable program instructions, which can execute the computer readable program instructions.
Various aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.
These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable medium having the instructions stored therein includes an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The computer program product may be realized in particular by means of hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied as a computer storage medium, and in another alternative embodiment, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK), or the like.
The foregoing description of various embodiments is intended to highlight differences between the various embodiments, which may be the same or similar to each other by reference, and is not repeated herein for the sake of brevity.
It will be appreciated by those skilled in the art that in the above-described method of the specific embodiments, the written order of steps is not meant to imply a strict order of execution but rather should be construed according to the function and possibly inherent logic of the steps.
If the technical scheme of the application relates to personal information, the product applying the technical scheme of the application clearly informs the personal information processing rule before processing the personal information and obtains the autonomous agreement of the individual. If the technical scheme of the application relates to sensitive personal information, the product applying the technical scheme of the application obtains individual consent before processing the sensitive personal information, and simultaneously meets the requirement of 'explicit consent'. For example, a clear and obvious mark is set at a personal information acquisition device such as a camera to inform that the personal information acquisition range is entered, personal information is acquired, if the personal voluntarily enters the acquisition range, the personal information is considered as consent to acquire the personal information, or if a clear mark/information is used on a personal information processing device to inform that the personal information processing rule is used, personal authorization is obtained through popup information or a mode of requesting the personal information to upload the personal information by the personal, wherein the personal information processing rule can comprise information such as a personal information processor, a personal information processing purpose, a processing mode, a processed personal information type and the like.
The foregoing description of the embodiments of the present disclosure has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the various embodiments described. The terminology used herein was chosen in order to best explain the principles of the embodiments, the practical application, or the improvement of technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims (12)

1.一种图像分割方法,其特征在于,所述方法包括:1. An image segmentation method, characterized in that the method comprises: 获取第一全局图像中每个人体框对应的移动参数,所述移动参数用于表示所述人体框对应的移动规律或移动趋势;Obtain the motion parameters corresponding to each human body bounding box in the first global image. The motion parameters are used to represent the motion pattern or motion trend corresponding to the human body bounding box. 针对每个所述人体框,根据移动参数对所述人体框进行缩放,得到所述人体框对应的目标框;For each human body bounding box, the bounding box is scaled according to the movement parameters to obtain the target bounding box corresponding to the human body bounding box; 根据各个人体框对应的目标框和所述第一全局图像,确定第一目标图像;The first target image is determined based on the target bounding box corresponding to each human body frame and the first global image; 对所述第一目标图像进行图像分割,得到所述第一全局图像的人体分割结果;The first target image is segmented to obtain the human body segmentation result of the first global image; 所述获取第一全局图像中每个人体框对应的移动参数,包括:The step of obtaining the motion parameters corresponding to each human bounding box in the first global image includes: 对所述第一全局图像进行目标检测,得到所述第一全局图像的目标检测结果,所述目标检测结果用于指示所述第一全局图像中包括的人体框的位置;Target detection is performed on the first global image to obtain the target detection result of the first global image. The target detection result is used to indicate the position of the human body bounding box included in the first global image. 根据所述第一全局图像的目标检测结果,确定所述第一全局图像的场景信息,以及所述第一全局图像中每个人体框对应的距离信息,所述场景信息用于指示所述第一全局图像是单人场景下的全局图像还是多人场景下的全局图像,所述距离信息用于指示所述人体框中的人体与第一图像采集设备之间的距离,所述第一图像采集设备用于采集所述第一全局图像;Based on the target detection results of the first global image, the scene information of the first global image and the distance information corresponding to each human bounding box in the first global image are determined. The scene information is used to indicate whether the first global image is a global image in a single-person scene or a global image in a multi-person scene. The distance information is used to indicate the distance between the human body in the human bounding box and the first image acquisition device. The first image acquisition device is used to acquire the first global image. 针对每个所述人体框,根据所述场景信息、所述人体框对应的距离信息以及第一预设映射关系,得到所述人体框对应的移动参数,所述第一预设映射关系用于指示不同场景下、不同距离的人体框对应的移动参数。For each human body bounding box, the movement parameters corresponding to the human body bounding box are obtained based on the scene information, the distance information corresponding to the human body bounding box, and the first preset mapping relationship. The first preset mapping relationship is used to indicate the movement parameters corresponding to human body bounding boxes at different distances in different scenes. 2.根据权利要求1所述的方法,其特征在于,所述方法还包括:2. The method according to claim 1, characterized in that the method further comprises: 获取第一视频,所述第一视频对应于第一场景,所述第一场景为单人场景或者多人场景,所述第一视频用于记录单人或者多人在第一移动范围内的移动情况,其中,所述第一移动范围与第二图像采集设备相距第一距离,所述第二图像采集设备用于采集所述第一视频;Acquire a first video, which corresponds to a first scene, which is a single-person scene or a multi-person scene. The first video is used to record the movement of a single person or multiple people within a first movement range. The first movement range is a first distance away from the second image acquisition device, and the second image acquisition device is used to acquire the first video. 对所述第一视频中的目标人物进行肢体跟踪,得到所述目标人物对应的人体框在所述第一视频的各帧图像中的位置;The target person in the first video is tracked to obtain the position of the human bounding box corresponding to the target person in each frame of the first video. 根据所述目标人物对应的人体框在所述第一视频的各帧图像中的位置,确定所述目标人物的移动速度、移动幅度和第二距离,所述第二距离用于指示所述目标人物与所述第一视频的参考位置的距离;Based on the position of the human body frame corresponding to the target person in each frame of the first video, the movement speed, movement range, and second distance of the target person are determined, and the second distance is used to indicate the distance between the target person and the reference position of the first video. 根据所述目标人物的移动速度和移动幅度,获得第一移动参数;Based on the target person's movement speed and movement range, obtain the first movement parameter; 基于所述第一场景、所述第二距离和所述第一移动参数,建立所述第一预设映射关系。Based on the first scenario, the second distance, and the first movement parameters, the first preset mapping relationship is established. 3.根据权利要求2所述的方法,其特征在于,所述方法还包括:3. The method according to claim 2, characterized in that the method further comprises: 在所述第一场景为单人场景的情况下,设置第一视频中人的移动频率大于第一频率阈值,以及所述第一移动范围在所述第二图像采集设备的相机坐标系的x轴方向上投影的长度大于第一移动阈值;In the case that the first scene is a single-person scene, the movement frequency of the person in the first video is set to be greater than a first frequency threshold, and the length of the projection of the first movement range in the x-axis direction of the camera coordinate system of the second image acquisition device is greater than the first movement threshold. 在所述第一场景为多人场景的情况下,设置第一视频中人的移动频率小于或者等于第二频率阈值,以及所述第一移动范围在所述第二图像采集设备的相机坐标系的x轴方向上投影的长度小于或者等于第二移动阈值;In the case where the first scenario is a multi-person scenario, the movement frequency of the people in the first video is set to be less than or equal to a second frequency threshold, and the length of the projection of the first movement range in the x-axis direction of the camera coordinate system of the second image acquisition device is less than or equal to the second movement threshold. 其中,所述第二频率阈值小于或者等于所述第一频率阈值,所述第二移动阈值小于或者等于所述第一移动阈值。Wherein, the second frequency threshold is less than or equal to the first frequency threshold, and the second movement threshold is less than or equal to the first movement threshold. 4.根据权利要求1至3中任意一项所述的方法,其特征在于,所述根据所述人体框对应的移动参数对所述人体框进行缩放,得到所述人体框对应的目标框,包括:4. The method according to any one of claims 1 to 3, characterized in that, scaling the human body frame according to the movement parameters corresponding to the human body frame to obtain the target frame corresponding to the human body frame includes: 根据所述人体框对应的移动参数和第二预设映射关系,确定所述人体框的缩放系数;The scaling factor of the human body frame is determined based on the movement parameters corresponding to the human body frame and the second preset mapping relationship. 按照所述缩放系数对所述人体框进行缩放,得到所述人体框对应的目标框,所述第二预设映射关系用于指示不同的移动参数对应的缩放系数。The human body frame is scaled according to the scaling factor to obtain the target frame corresponding to the human body frame. The second preset mapping relationship is used to indicate the scaling factor corresponding to different movement parameters. 5.根据权利要求1至3中任意一项所述的方法,其特征在于,所述根据各个人体框对应的目标框和所述第一全局图像,确定第一目标图像,包括:5. The method according to any one of claims 1 to 3, characterized in that, determining the first target image based on the target bounding box corresponding to each human body bounding box and the first global image includes: 对所述各个人体框对应的目标框进行合并得到合并框;The target boxes corresponding to each human body frame are merged to obtain a merged box; 根据所述合并框和所述第一全局图像,得到所述第一目标图像;The first target image is obtained based on the merged frame and the first global image; 其中,所述第一目标图像对应于能够覆盖所有人体框对应的目标框的合并框中,面积最小的合并框。The first target image corresponds to the smallest merged frame among the merged frames that can cover all the target frames corresponding to human body frames. 6.根据权利要求1至3中任意一项所述的方法,其特征在于,所述对所述第一目标图像进行图像分割,得到所述第一全局图像的人体分割结果,包括:6. The method according to any one of claims 1 to 3, characterized in that, the step of performing image segmentation on the first target image to obtain the human body segmentation result of the first global image includes: 获取第二全局图像对应的第二目标图像,所述第二全局图像为视频中所述第一全局图像的前一帧图像,所述第二目标图像表示获取所述第二全局图像的人体分割结果时所采用的目标图像;Obtain the second target image corresponding to the second global image. The second global image is the previous frame image of the first global image in the video. The second target image represents the target image used when obtaining the human body segmentation result of the second global image. 确定所述第一目标图像相对于所述第二目标图像的移动幅度;Determine the magnitude of the movement of the first target image relative to the second target image; 在所述第一目标图像相对于所述第二目标图像的移动幅度大于第一幅度阈值的情况下,对所述第一目标图像进行图像分割,得到所述第一全局图像的人体分割结果。If the movement of the first target image relative to the second target image is greater than a first amplitude threshold, the first target image is segmented to obtain the human body segmentation result of the first global image. 7.根据权利要求6所述的方法,其特征在于,所述方法还包括:7. The method according to claim 6, characterized in that the method further comprises: 在所述第一目标图像相对于所述第二目标图像的移动幅度小于或者等于所述第一幅度阈值的情况下,采用所述第二目标图像进行图像分割,得到所述第一全局图像的人体分割结果。If the movement of the first target image relative to the second target image is less than or equal to the first amplitude threshold, the second target image is used for image segmentation to obtain the human body segmentation result of the first global image. 8.根据权利要求1至3中任意一项所述的方法,其特征在于,所述对所述第一目标图像进行图像分割,得到所述第一全局图像的人体分割结果,包括:8. The method according to any one of claims 1 to 3, characterized in that, the step of performing image segmentation on the first target image to obtain the human body segmentation result of the first global image includes: 获取第二全局图像对应的第二目标图像,所述第二全局图像与所述第一全局图像属于同一视频,且所述第二全局图像为所述第一全局图像的前一帧图像,所述第二目标图像表示获取所述第二全局图像的人体分割结果时所采用的目标图像;Obtain the second target image corresponding to the second global image. The second global image and the first global image belong to the same video, and the second global image is the previous frame image of the first global image. The second target image represents the target image used when obtaining the human body segmentation result of the second global image. 确定所述第一目标图像相对于所述第二目标图像的覆盖率;Determine the coverage of the first target image relative to the second target image; 在所述覆盖率小于第二幅度阈值的情况下,对所述第一目标图像进行图像分割,得到所述第一全局图像的人体分割结果。If the coverage is less than the second amplitude threshold, the first target image is segmented to obtain the human body segmentation result of the first global image. 9.根据权利要求8所述的方法,其特征在于,所述方法还包括:9. The method according to claim 8, characterized in that the method further comprises: 在所述覆盖率大于或者等于所述第二幅度阈值的情况下,采用所述第二目标图像进行图像分割,得到所述第一全局图像的人体分割结果。If the coverage is greater than or equal to the second amplitude threshold, the second target image is used for image segmentation to obtain the human body segmentation result of the first global image. 10.一种图像分割装置,其特征在于,所述装置包括:10. An image segmentation apparatus, characterized in that the apparatus comprises: 第一获取模块,用于获取第一全局图像中每个人体框对应的移动参数,所述移动参数用于表示所述人体框对应的移动规律或移动趋势;The first acquisition module is used to acquire the motion parameters corresponding to each human body bounding box in the first global image. The motion parameters are used to represent the motion pattern or motion trend corresponding to the human body bounding box. 缩放模块,用于针对每个所述人体框,根据所述第一获取模块获取的移动参数对所述人体框进行缩放,得到所述人体框对应的目标框;The scaling module is used to scale each human body frame according to the movement parameters obtained by the first acquisition module to obtain the target frame corresponding to the human body frame. 第一确定模块,用于根据所述缩放模块缩放得到的各个人体框对应的目标框和所述第一全局图像,确定第一目标图像;The first determining module is used to determine the first target image based on the target bounding boxes corresponding to each human body bounding box obtained by scaling by the scaling module and the first global image; 第一分割模块,用于对所述第一确定模块确定的第一目标图像进行图像分割,得到所述第一全局图像的人体分割结果;The first segmentation module is used to perform image segmentation on the first target image determined by the first determining module to obtain the human body segmentation result of the first global image; 所述第一获取模块还用于:The first acquisition module is also used for: 对所述第一全局图像进行目标检测,得到所述第一全局图像的目标检测结果,所述目标检测结果用于指示所述第一全局图像中包括的人体框的位置;Target detection is performed on the first global image to obtain the target detection result of the first global image. The target detection result is used to indicate the position of the human body bounding box included in the first global image. 根据所述第一全局图像的目标检测结果,确定所述第一全局图像的场景信息,以及所述第一全局图像中每个人体框对应的距离信息,所述场景信息用于指示所述第一全局图像是单人场景下的全局图像还是多人场景下的全局图像,所述距离信息用于指示所述人体框中的人体与第一图像采集设备之间的距离,所述第一图像采集设备用于采集所述第一全局图像;Based on the target detection results of the first global image, the scene information of the first global image and the distance information corresponding to each human bounding box in the first global image are determined. The scene information is used to indicate whether the first global image is a global image in a single-person scene or a global image in a multi-person scene. The distance information is used to indicate the distance between the human body in the human bounding box and the first image acquisition device. The first image acquisition device is used to acquire the first global image. 针对每个所述人体框,根据所述场景信息、所述人体框对应的距离信息以及第一预设映射关系,得到所述人体框对应的移动参数,所述第一预设映射关系用于指示不同场景下、不同距离的人体框对应的移动参数。For each human body bounding box, the movement parameters corresponding to the human body bounding box are obtained based on the scene information, the distance information corresponding to the human body bounding box, and the first preset mapping relationship. The first preset mapping relationship is used to indicate the movement parameters corresponding to human body bounding boxes at different distances in different scenes. 11.一种电子设备,其特征在于,包括:11. An electronic device, characterized in that it comprises: 处理器;processor; 用于存储处理器可执行指令的存储器;Memory used to store processor-executable instructions; 其中,所述处理器被配置为调用所述存储器存储的指令,以执行权利要求1至9中任意一项所述的方法。The processor is configured to invoke instructions stored in the memory to execute the method according to any one of claims 1 to 9. 12.一种计算机可读存储介质,其上存储有计算机程序指令,其特征在于,所述计算机程序指令被处理器执行时实现权利要求1至9中任意一项所述的方法。12. A computer-readable storage medium having stored thereon computer program instructions, characterized in that, when executed by a processor, the computer program instructions implement the method described in any one of claims 1 to 9.
CN202210322862.0A 2022-03-29 2022-03-29 Image segmentation method and device, electronic equipment and storage medium Active CN114638817B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210322862.0A CN114638817B (en) 2022-03-29 2022-03-29 Image segmentation method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210322862.0A CN114638817B (en) 2022-03-29 2022-03-29 Image segmentation method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN114638817A CN114638817A (en) 2022-06-17
CN114638817B true CN114638817B (en) 2025-11-21

Family

ID=81951284

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210322862.0A Active CN114638817B (en) 2022-03-29 2022-03-29 Image segmentation method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114638817B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119902732A (en) * 2023-10-25 2025-04-29 广州视源电子科技股份有限公司 Character close-up method, device, equipment and storage medium based on conference tablet
CN119323579A (en) * 2024-09-29 2025-01-17 浪潮智慧科技有限公司 Method, system, terminal and medium for dividing image to generate jigsaw cutting path

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108062761A (en) * 2017-12-25 2018-05-22 北京奇虎科技有限公司 Image partition method, device and computing device based on adaptive tracing frame
CN112019868A (en) * 2019-05-31 2020-12-01 广州虎牙信息科技有限公司 Portrait segmentation method and device and electronic equipment

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005223487A (en) * 2004-02-04 2005-08-18 Mainichi Broadcasting System Inc Digital camera work apparatus, digital camera work method, and digital camera work program
US10102635B2 (en) * 2016-03-10 2018-10-16 Sony Corporation Method for moving object detection by a Kalman filter-based approach
TWI711007B (en) * 2019-05-02 2020-11-21 緯創資通股份有限公司 Method and computing device for adjusting region of interest

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108062761A (en) * 2017-12-25 2018-05-22 北京奇虎科技有限公司 Image partition method, device and computing device based on adaptive tracing frame
CN112019868A (en) * 2019-05-31 2020-12-01 广州虎牙信息科技有限公司 Portrait segmentation method and device and electronic equipment

Also Published As

Publication number Publication date
CN114638817A (en) 2022-06-17

Similar Documents

Publication Publication Date Title
US11288531B2 (en) Image processing method and apparatus, electronic device, and storage medium
CN111626183B (en) Target object display method and device, electronic equipment and storage medium
CN109584362B (en) Three-dimensional model construction method and device, electronic equipment and storage medium
CN109840917B (en) Image processing method and device and network training method and device
CN111401230B (en) Gesture estimation method and device, electronic equipment and storage medium
CN113822798B (en) Method and device for training generation countermeasure network, electronic equipment and storage medium
CN110853095B (en) Camera positioning method and device, electronic equipment and storage medium
CN109840939B (en) Three-dimensional reconstruction method, three-dimensional reconstruction device, electronic equipment and storage medium
CN112991381B (en) Image processing method and device, electronic equipment and storage medium
CN114387445A (en) Object key point identification method and device, electronic equipment and storage medium
CN114067085A (en) Virtual object display method and device, electronic equipment and storage medium
CN112613447B (en) Key point detection method and device, electronic equipment and storage medium
CN114581525B (en) Attitude determination method and device, electronic device and storage medium
CN114266305A (en) Object identification method and device, electronic equipment and storage medium
CN114445753A (en) Face tracking recognition method and device, electronic equipment and storage medium
CN109325908B (en) Image processing method and device, electronic equipment and storage medium
CN114550086B (en) A crowd positioning method and device, electronic device and storage medium
CN114550261A (en) Face recognition method and device, electronic equipment and storage medium
CN114638817B (en) Image segmentation method and device, electronic equipment and storage medium
CN112767288A (en) Image processing method and device, electronic equipment and storage medium
CN114463212A (en) Image processing method and device, electronic equipment and storage medium
WO2023273498A1 (en) Depth detection method and apparatus, electronic device, and storage medium
CN112906467A (en) Group photo image generation method and device, electronic device and storage medium
CN112330721A (en) Three-dimensional coordinate recovery method and device, electronic equipment and storage medium
CN112767541B (en) Three-dimensional reconstruction method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant