Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.
In order to explain the technical solution described in the present application, the following description will be given by way of specific examples.
It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It is also to be understood that the terminology used in the description of the present application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in the specification of the present application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.
As used in this specification and the appended claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to a determination" or "in response to a detection". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".
In particular implementations, the mobile terminals described in embodiments of the present application include, but are not limited to, other portable devices such as mobile phones, laptop computers, or tablet computers having touch sensitive surfaces (e.g., touch screen displays and/or touch pads). It should also be understood that in some embodiments, the devices described above are not portable communication devices, but rather are desktop computers having touch-sensitive surfaces (e.g., touch screen displays and/or touch pads).
In the discussion that follows, a mobile terminal that includes a display and a touch-sensitive surface is described. However, it should be understood that the mobile terminal may include one or more other physical user interface devices such as a physical keyboard, mouse, and/or joystick.
The mobile terminal supports various applications, such as one or more of the following: a drawing application, a presentation application, a word processing application, a website creation application, a disc burning application, a spreadsheet application, a gaming application, a telephone application, a video conferencing application, an email application, an instant messaging application, an exercise support application, a photo management application, a digital camera application, a web browsing application, a digital music player application, and/or a digital video player application.
Various applications that may be executed on the mobile terminal may use at least one common physical user interface device, such as a touch-sensitive surface. One or more functions of the touch-sensitive surface and corresponding information displayed on the terminal can be adjusted and/or changed between applications and/or within respective applications. In this way, a common physical architecture (e.g., touch-sensitive surface) of the terminal can support various applications with user interfaces that are intuitive and transparent to the user.
In addition, in the description of the present application, the terms "first," "second," "third," and the like are used solely to distinguish one from another and are not to be construed as indicating or implying relative importance.
The first embodiment is as follows:
fig. 1 shows a flowchart of a first target image segmentation method provided in an embodiment of the present application, which is detailed as follows:
step S11, acquiring a video image;
the video image of the present embodiment may be a photographed video image or a video image being photographed.
It should be noted that, when the user combines a plurality of captured pictures into one video stream, the picture corresponding to the video stream is also the video image of the present embodiment.
Step S12, segmenting the 1 st frame of the video image by adopting a lightweight convolutional neural network to obtain a target image;
the target image may be a target image in a foreground or a target image in a background, and is specifically set according to a user requirement. In the step, the lightweight convolutional neural network is trained in advance, so that the trained lightweight convolutional neural network can identify the specified target image, and the specified target image can be identified when the trained lightweight convolutional neural network is adopted to segment the 1 st frame of the video image. The target image is segmented by adopting the lightweight convolutional neural network, and the identification speed of the lightweight convolutional neural network is higher, so that the target image of the 1 st frame can be quickly identified.
Optionally, the lightweight convolutional neural network here is MobileUnet.
Step S13, tracking the target image of the subsequent frame of the video image by adopting a preset target tracking method to obtain the tracking area of the target image;
in this embodiment, the tracking area of the target image is an area including the target image, and as shown in fig. 2, assuming that the target image is a person, a sheep, or a dog, an area included in a frame where the person, the sheep, or the dog is located is the tracking area of the person, the sheep, or the dog.
The preset target tracking method is a traditional target tracking method, such as kalman filtering, extended kalman filtering, particle filtering, and the like.
And step S14, segmenting the tracking area of the target image by adopting the lightweight convolutional neural network to obtain the target image.
Wherein, the lightweight convolutional neural network of the step is the same as that of the step S12. It should be noted that, in this step, the lightweight convolutional neural network segments the target image only for the tracking area of the target image, and does not need to segment the target image for the image of the whole frame, thereby greatly reducing the amount of computation.
In the embodiment of the application, a video image is obtained, a 1 st frame of the video image is segmented by adopting a lightweight convolutional neural network to obtain a target image, the target image of a subsequent frame of the video image is tracked by adopting a preset target tracking method to obtain a tracking area of the target image, and the tracking area of the target image is segmented by adopting the lightweight convolutional neural network to obtain the target image. The target image of the 1 st frame is obtained by segmenting the lightweight convolutional neural network, so that the target image of the 1 st frame can be rapidly identified, and the tracking area of the target image is determined by adopting a preset target tracking method in the subsequent frames, so that the lightweight convolutional neural network can obtain the target image only by segmenting the tracking area of the target image, the target image does not need to be segmented from the whole frame, and the operation amount of segmenting the target image of the whole video image is greatly reduced.
Optionally, since there may be a target image tracking and identifying situation during the tracking process, in order to segment the target image in time, the target image segmentation method further includes:
if the target image of the subsequent frame of the video image fails to be tracked by adopting a preset target tracking method, segmenting the subsequent frame of the video image by adopting the lightweight convolutional neural network to obtain a new target image.
Specifically, if a target image tracking failure is found in a subsequent certain frame, a lightweight convolutional neural network is adopted to segment the certain frame to obtain a segmented target image. And then tracking the target image of a subsequent frame after the certain frame by adopting a preset target tracking method.
Example two:
fig. 3 shows a flowchart of a second target image segmentation method provided in the second embodiment of the present application, and in this embodiment, step S31, step S32, and step S35 are the same as step S11, step S12, and step S14 in the first embodiment, and are not repeated here.
In order to segment the target image more quickly, the tracking area of the target image is determined by using a preset target tracking method only for the I frame of the video image, which is detailed as follows:
step S31, acquiring a video image;
step S32, segmenting the 1 st frame of the video image by adopting a lightweight convolutional neural network to obtain a target image;
step S33, judging whether the subsequent frame of the video image is an I frame;
step S34, if the subsequent frame of the video image is an I frame, tracking the target image of the I frame by using the preset target tracking method to obtain a tracking area of the target image.
And step S35, segmenting the tracking area of the target image by adopting the lightweight convolutional neural network to obtain the target image.
In the embodiment of the application, when the target image tracking is carried out on the subsequent frame of the video image, only the I frame of the subsequent frame of the video image is tracked, and the non-I frame of the subsequent frame is not tracked, so that the tracking calculation amount is reduced, and the target image tracking is carried out on the I frame only because the I frame of the video image comprises most information of the video image, so that the accurate segmentation of the target image can be ensured.
Optionally, if it is desired to implement timely segmentation of a target image, after determining that a subsequent frame of the video image is an I frame, segmenting the I frame of the video image by using the lightweight convolutional neural network to obtain the target image; and when the subsequent frame of the video image is judged to be a non-I frame, tracking the target image of the non-I frame by adopting the preset target tracking method to obtain a tracking area of the target image.
Here, the non-I frame includes a B frame and a P frame.
Of course, if the subsequent frame of the video image is a non-I frame and the tracking of the target image by the preset target tracking method fails, the non-I frame is still segmented by the lightweight convolutional neural network to obtain the target image.
In this embodiment, because the lightweight convolutional neural network is adopted for target image segmentation for each I frame, the probability of failure in tracking the target image by using a preset target tracking method is reduced, and thus the target image can be segmented in time.
Example three:
fig. 4 shows a flowchart of a third target image segmentation method provided in the third embodiment of the present application, and step S41, step S42, and step S45 in this embodiment are respectively the same as step S11, step S12, and step S14 in the first embodiment, and are not repeated here.
Step S41, acquiring a video image;
step S42, segmenting the 1 st frame of the video image by adopting a lightweight convolutional neural network to obtain a target image;
step S43, judging whether a scene corresponding to the video image is a preset scene, if so, tracking the target image of a subsequent frame of the video image by adopting a first preset target tracking method to obtain a tracking area of the target image, wherein the preset scene is a scene with known noise;
specifically, if a scene with noise is known, for example, a scene of a tracking shot of a certain person or an article, a first preset target tracking method, such as a kalman filter tracking method, is selected.
And step S44, if the scene is not a preset scene, tracking the target image of the subsequent frame of the video image by adopting a second preset target tracking method to obtain the tracking area of the target image.
Specifically, if the noise is unknown, for example, a scene of a great event (including multiple targets), a second preset target tracking method, such as a particle filter tracking method, is selected.
And step S45, segmenting the tracking area of the target image by adopting the lightweight convolutional neural network to obtain the target image.
In this embodiment, different target tracking methods are selected according to the scene of the video image, so that the accuracy of determining the tracking area of the target image can be improved.
Optionally, the step S11 (or step S31 or step S41) includes:
and detecting whether a video shooting instruction is received or not, and if the video shooting instruction is received, acquiring a video image.
Specifically, when a user opens a camera interface and selects video shooting, a video shooting instruction is sent, and at this time, the terminal device acquires a corresponding video image.
Optionally, after the step S11 (or step S31 or step S41), the method includes:
acquiring the resolution of the video image, and judging whether the resolution of the video image is greater than a preset resolution threshold value or not;
at this time, the step S12 (or step S32 or step S42) specifically includes:
and if the resolution of the video image is greater than a preset resolution threshold, segmenting the 1 st frame of the video image by adopting a lightweight convolutional neural network to obtain a target image.
Optionally, when the resolution of the video image is less than or equal to a preset resolution threshold, the Mask RCNN is used to segment the video image to obtain a corresponding target image. Among them, Mask RCNN is a new convolutional network proposed by wakamm based on the conventional faster RCNN architecture.
In the embodiment, when the resolution of the video image is greater than a preset resolution threshold, the 1 st frame of the video image is segmented by adopting a lightweight convolutional neural network, otherwise, the video image is segmented by adopting a Mask RCNN, so that the target image can be accurately segmented when the resolution of the video image is not large, and the target image can be rapidly and accurately segmented when the resolution of the video image is large.
Example four:
fig. 5 is a schematic structural diagram of a target image segmentation apparatus provided in the fourth embodiment of the present application, and for convenience of description, only the portions related to this embodiment are shown:
the target image segmentation means 5 comprises: a video image acquisition unit 51, a 1 st frame image segmentation unit 52, a tracking area determination unit 53, a tracking area segmentation unit 54. Wherein:
a video image acquisition unit 51 for acquiring a video image;
a frame 1 image segmentation unit 52, configured to segment a frame 1 of the video image by using a lightweight convolutional neural network to obtain a target image;
optionally, the lightweight convolutional neural network here is MobileUnet.
A tracking area determining unit 53, configured to track the target image of a subsequent frame of the video image by using a preset target tracking method to obtain a tracking area of the target image;
and a tracking area segmentation unit 54, configured to segment the tracking area of the target image by using the lightweight convolutional neural network to obtain the target image.
In the embodiment of the application, the target image of the 1 st frame is obtained by segmenting the target image through the lightweight convolutional neural network, so that the target image of the 1 st frame can be rapidly identified, and the tracking area of the target image is determined by adopting a preset target tracking method in the subsequent frames, so that the lightweight convolutional neural network can obtain the target image only by segmenting the tracking area of the target image, the target image does not need to be segmented from the whole frame, and the operation amount of segmenting the target image through the whole video image is greatly reduced.
Optionally, the target image segmentation apparatus 5 further includes:
and the target image re-segmentation unit is used for segmenting the subsequent frame of the video image by adopting the lightweight convolutional neural network to obtain a new target image if the target image of the subsequent frame of the video image fails to be tracked by adopting a preset target tracking method.
Optionally, the tracking area determining unit 53 includes:
the frame type judging module is used for judging whether the subsequent frame of the video image is an I frame;
and the I frame processing module is used for tracking the target image of the I frame by adopting the preset target tracking method to obtain a tracking area of the target image if the subsequent frame of the video image is the I frame.
Optionally, the tracking area determining unit 53 includes:
the I frame target image segmentation module is used for segmenting the I frame of the video image by adopting the lightweight convolutional neural network after judging that the subsequent frame of the video image is an I frame to obtain the target image;
and the non-I-frame target image tracking module is used for tracking the target image of the non-I frame by adopting the preset target tracking method when judging that the subsequent frame of the video image is a non-I frame so as to obtain a tracking area of the target image.
Optionally, the tracking area determining unit 53 includes:
the first tracking area determining module is configured to determine whether a scene corresponding to the video image is a preset scene, and if the scene is the preset scene, track the target image of a subsequent frame of the video image by using a first preset target tracking method to obtain a tracking area of the target image, where the preset scene is a scene with known noise;
and the second tracking area determining module is used for tracking the target image of the subsequent frame of the video image by adopting a second preset target tracking method if the target image is not the preset scene so as to obtain the tracking area of the target image.
It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.
Example five:
fig. 6 is a schematic diagram of a terminal device provided in the fifth embodiment of the present application. As shown in fig. 6, the terminal device 6 of this embodiment includes: a processor 60, a memory 61 and a computer program 62 stored in said memory 61 and executable on said processor 60. The processor 60, when executing the computer program 62, implements the steps in the various target image segmentation method embodiments described above, such as the steps S11-S14 shown in fig. 1. Alternatively, the processor 60, when executing the computer program 62, implements the functions of the modules/units in the above-mentioned device embodiments, such as the functions of the modules 51 to 54 shown in fig. 5.
Illustratively, the computer program 62 may be partitioned into one or more modules/units that are stored in the memory 61 and executed by the processor 60 to accomplish the present application. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution of the computer program 62 in the terminal device 6. For example, the computer program 62 may be divided into a video image acquisition unit, a 1 st frame image division unit, a tracking area determination unit, and a tracking area division unit, and the specific functions of each unit are as follows:
a video image acquisition unit for acquiring a video image;
the 1 st frame image segmentation unit is used for segmenting the 1 st frame of the video image by adopting a lightweight convolutional neural network to obtain a target image;
a tracking area determining unit, configured to track the target image of a subsequent frame of the video image by using a preset target tracking method to obtain a tracking area of the target image;
and the tracking area segmentation unit is used for segmenting the tracking area of the target image by adopting the lightweight convolutional neural network to obtain the target image.
The terminal device 6 may be a desktop computer, a notebook, a palm computer, a cloud server, or other computing devices. The terminal device may include, but is not limited to, a processor 60, a memory 61. Those skilled in the art will appreciate that fig. 6 is merely an example of a terminal device 6 and does not constitute a limitation of terminal device 6 and may include more or less components than those shown, or some components in combination, or different components, for example, the terminal device may also include input output devices, network access devices, buses, etc.
The Processor 60 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory 61 may be an internal storage unit of the terminal device 6, such as a hard disk or a memory of the terminal device 6. The memory 61 may also be an external storage device of the terminal device 6, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the terminal device 6. Further, the memory 61 may also include both an internal storage unit and an external storage device of the terminal device 6. The memory 61 is used for storing the computer program and other programs and data required by the terminal device. The memory 61 may also be used to temporarily store data that has been output or is to be output.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus/terminal device and method may be implemented in other ways. For example, the above-described embodiments of the apparatus/terminal device are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow in the method of the embodiments described above can be realized by a computer program, which can be stored in a computer-readable storage medium and can realize the steps of the embodiments of the methods described above when the computer program is executed by a processor. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.