CN108966004B

CN108966004B - Video processing method and terminal

Info

Publication number: CN108966004B
Application number: CN201810679769.9A
Authority: CN
Inventors: 李攀
Original assignee: Vivo Mobile Communication Co Ltd
Current assignee: Vivo Mobile Communication Co Ltd
Priority date: 2018-06-27
Filing date: 2018-06-27
Publication date: 2022-06-17
Anticipated expiration: 2038-06-27
Also published as: CN108966004A

Abstract

Embodiments of the present invention provide a video processing method and a terminal, which are applied in the field of communication technologies to solve the problem of low convenience for a terminal to play a video. The method includes: receiving a first input from a user, where the first input is used to trigger a terminal to determine a target video; in response to the first input, acquiring N frames of images of the target video, where N is a positive integer; and acquiring M images according to the N frames of images frame sequence, the picture similarity between any two adjacent images in each image frame sequence in the M image frame sequence is less than or equal to the preset threshold, M is a positive integer, and M is less than or equal to N; respectively combine M For each image in the image frame sequence, M video segments are obtained. The solution is specifically applied to the process of generating the video segment before playing the video segment of the target video.

Description

Video processing method and terminal

Technical Field

The embodiment of the invention relates to the technical field of communication, in particular to a video processing method and a terminal.

Background

With the development of communication technology, the intelligent degree of terminals such as mobile phones and tablet computers is continuously improved so as to meet various requirements of users. For example, users have an increasing demand for convenience in playing media files such as videos using terminals.

In the prior art, in a scene in which a user needs to view a part of segments in a video to quickly browse an entire video file, the user generally needs to drag a progress bar in a video playing interface to find a video node in which the user is interested.

The method has the problems that a user may need to repeatedly drag a progress bar in a video playing interface to find a video node which the user is interested in, so that the operation of the user in the process of playing the video by using the terminal is complicated, namely, the convenience of playing the video by using the terminal is low.

Disclosure of Invention

The embodiment of the invention provides a video processing method and a terminal, and aims to solve the problem that the terminal is low in convenience in video playing.

In order to solve the above technical problem, the embodiment of the present invention is implemented as follows:

in a first aspect, an embodiment of the present invention provides a video processing method, where the method includes: receiving a first input of a user, wherein the first input is used for triggering a terminal to determine a target video; responding to a first input, and acquiring N frames of images of a target video, wherein N is a positive integer; acquiring M image frame sequences according to the N images, wherein the picture similarity between any two adjacent images in each image frame sequence in the M image frame sequences is less than or equal to a preset threshold value, M is a positive integer, and M is less than or equal to N; and respectively synthesizing the images in each image frame sequence in the M image frame sequences to obtain M video segments.

In a second aspect, an embodiment of the present invention further provides a terminal, where the terminal includes: the device comprises a receiving module, an obtaining module and an integrating module; the receiving module is used for receiving a first input of a user, and the first input is used for triggering the terminal to determine a target video; the acquisition module is used for responding to the first input received by the receiving module and acquiring N frames of images of the target video, wherein N is a positive integer; acquiring M image frame sequences according to the N images, wherein the picture similarity between any two adjacent images in each image frame sequence in the M image frame sequences is less than or equal to a preset threshold value, M is a positive integer, and M is less than or equal to N; and the integration module is used for respectively synthesizing the images in each image frame sequence in the M image frame sequences acquired by the acquisition module to obtain M video segments.

In a third aspect, an embodiment of the present invention provides a terminal, including a processor, a memory, and a computer program stored on the memory and operable on the processor, where the computer program, when executed by the processor, implements the steps of the video processing method according to the first aspect.

In a fourth aspect, the present invention provides a computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, implements the steps of the video processing method according to the first aspect.

In the embodiment of the invention, the terminal can autonomously generate a plurality of video clips (such as M video clips) from a complete target video, and the similarity between any two frames of images in each video clip is less than or equal to a preset threshold; without requiring the user to manually select multiple video clips of the target video. Wherein each of the plurality of video segments may serve as a summary of the target video. Therefore, the user control terminal can view the video clips, and the target video can be browsed quickly. Therefore, convenience of playing videos by the terminal can be improved.

Drawings

Fig. 1 is a schematic diagram of an architecture of a possible android operating system according to an embodiment of the present invention;

fig. 2 is a schematic flowchart of a video processing method according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of a possible terminal according to an embodiment of the present invention;

fig. 4 is a schematic diagram of a hardware structure of a terminal according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that "/" in this context means "or", for example, A/B may mean A or B; "and/or" herein is merely an association relationship describing an associated object, and means that there may be three relationships, for example, a and/or B, and may mean: a exists alone, A and B exist simultaneously, and B exists alone. "plurality" means two or more than two.

It should be noted that, in the embodiments of the present invention, words such as "exemplary" or "for example" are used to indicate examples, illustrations or explanations. Any embodiment or design described as "exemplary" or "e.g.," an embodiment of the present invention is not necessarily to be construed as preferred or advantageous over other embodiments or designs. Rather, use of the word "exemplary" or "such as" is intended to present relevant concepts in a concrete fashion.

The terms "first," "second," and "third," etc. in the description and in the claims of the present invention are used for distinguishing between different objects and not for describing a particular sequential order of the objects. For example, the first input, second input, third input, etc. are for distinguishing between different inputs, rather than for describing a particular order of inputs.

According to the video processing method and the terminal provided by the embodiment of the invention, the terminal can generate a plurality of video clips from a complete video, and the images in each video clip are different images. Wherein each of the plurality of video segments can be a summary of the complete video. Therefore, the user control terminal can view the video clips and browse the complete video quickly. Therefore, convenience of playing videos by the terminal can be improved.

It should be noted that, in the video Processing method provided in the embodiment of the present invention, the execution main body may be the terminal, or a Central Processing Unit (CPU) of the terminal, or a control module in the terminal for executing the video Processing method. In the embodiment of the present invention, a video processing method executed by a terminal is taken as an example to describe the video processing method provided in the embodiment of the present invention.

The terminal in the embodiment of the present invention may be a terminal having an operating system. The operating system may be an Android (Android) operating system, an ios operating system, or other possible operating systems, and embodiments of the present invention are not limited in particular.

The following describes a software environment to which the video processing method provided by the embodiment of the present invention is applied, by taking an android operating system as an example.

Fig. 1 is a schematic diagram of an architecture of a possible android operating system according to an embodiment of the present invention. In fig. 1, the architecture of the android operating system includes 4 layers, which are respectively: an application layer, an application framework layer, a system runtime layer, and a kernel layer (specifically, a Linux kernel layer).

The application program layer comprises various application programs (including system application programs and third-party application programs) in an android operating system.

The application framework layer is a framework of the application, and a developer can develop some applications based on the application framework layer under the condition of observing the development principle of the framework of the application. For example, applications such as a system setup application, a system chat application, and a system camera application. And the third-party setting application, the third-party camera application, the third-party chatting application and other application programs.

The system runtime layer includes libraries (also called system libraries) and android operating system runtime environments. The library mainly provides various resources required by the android operating system. The android operating system running environment is used for providing a software environment for the android operating system.

The kernel layer is an operating system layer of an android operating system and belongs to the bottommost layer of an android operating system software layer. The kernel layer provides kernel system services and hardware-related drivers for the android operating system based on the Linux kernel.

Taking an android operating system as an example, in the embodiment of the present invention, a developer may develop a software program for implementing the video processing method provided in the embodiment of the present invention based on the system architecture of the android operating system shown in fig. 1, so that the video processing method may operate based on the android operating system shown in fig. 1. Namely, the processor or the terminal device can implement the video processing method provided by the embodiment of the invention by running the software program in the android operating system.

The following describes the video processing method provided by the embodiment of the present invention in detail with reference to the flowchart of the video processing method shown in fig. 2. Wherein, although the logical order of the video processing methods provided by embodiments of the present invention is shown in a method flow diagram, in some cases, the steps shown or described may be performed in an order different than here. For example, the video processing method shown in fig. 2 may include S201-S204:

s201, a terminal receives a first input of a user, wherein the first input is used for triggering the terminal to determine a target video.

Optionally, the terminal may be installed with a video playing application of the system video playing application and a video playing application of the third-party video playing application. The interface (i.e. video interface) of the video application displayed by the terminal may include one or more complete videos. The target video may be a complete video displayed in the video interface, that is, an unprocessed video.

Optionally, the first input provided by the embodiment of the present invention may be input of a target video in a video interface of the terminal by a user.

Optionally, the video interface provided in the embodiment of the present invention may further include a control (denoted as control 1), where the control is used to trigger the terminal to determine the target video and split the target video into multiple frames of images. Illustratively, the first input may be an input of a user to a control 1 in a video interface of the terminal.

Of course, the first input may also be an input of a user to a target video in a terminal video interface and an input to the control 1, where the control 1 is specifically configured to trigger the terminal to split the target video into multiple frames of images.

For example, the first input may be a user selection input of a target video displayed in a video interface of the terminal.

It should be noted that the screen of the terminal provided in the embodiment of the present invention may be a touch screen, and the touch screen may be configured to receive an input from a user and display a content corresponding to the input to the user in response to the input. The first input may be a touch screen input, a fingerprint input, a gravity input, a key input, or the like. The touch screen input is input such as press input, long press input, slide input, click input, and hover input (input by a user near the touch screen) of a touch screen of the terminal by the user. The fingerprint input is input by a user to a sliding fingerprint, a long-press fingerprint, a single-click fingerprint, a double-click fingerprint and the like of a fingerprint recognizer of the terminal. The gravity input is input such as shaking of a user in a specific direction of the terminal, shaking of a specific number of times, and the like. The key input corresponds to a single-click input, a double-click input, a long-press input, a combination key input, and the like of a user to a power key, a volume key, a Home key, and the like of the terminal. Specifically, the embodiment of the present invention does not specifically limit the manner of the first input, and may be any realizable manner.

S202, responding to the first input, the terminal obtains N frames of images of the target video, wherein N is a positive integer.

Specifically, the terminal may split the target video into N frames of images, where the N frames of images are a complete image frame sequence of the target video.

S203, the terminal obtains M image frame sequences according to the N images, wherein the picture similarity between any two adjacent images in each image frame sequence in the M image frame sequences is smaller than or equal to a preset threshold value, M is a positive integer, and M is smaller than or equal to N.

It should be noted that the two images in one video provided in the embodiment of the present invention have the same size. Alternatively, the picture similarity between two adjacent images may be an area of an overlapping region between the two adjacent images. Wherein, the larger the area of the overlapping region is, the higher the picture similarity is; the smaller the area of the overlapping area, i.e., the lower the picture similarity.

Alternatively, the picture similarity between two adjacent images may be a similarity between values of statistical histograms of the two adjacent images. The value of the statistical histogram of a frame of image may be a ratio of gray pixels to total pixels in the image.

Specifically, the higher picture similarity of the two frames of images indicates that the two frames of images are relatively similar.

It should be noted that, the user usually needs the images in the video segment (i.e. the summary) of one video not to be similar, i.e. the similarity between the images in the video segment is low, e.g. the similarity is less than or equal to the preset threshold. That is, the user usually needs a video clip of a video to be a relatively thin video clip with non-repetitive images.

It can be understood that the abstract of a video is a video segment required by a user in the video, and the user control terminal can view the abstract of a video, so that the video can be quickly browsed. The video clip of a video determined by the terminal meets the requirements of the user, namely the accuracy of the video clip is higher; therefore, the user control terminal checks the video clip of one video, and the better the video browsing effect is.

In the embodiment of the present invention, the value of the preset threshold is not specifically limited, and the implementation of the video processing method in the embodiment of the present invention is not limited.

S204, the terminal synthesizes images in each image frame sequence in the M image frame sequences respectively to obtain M video clips.

The terminal can encode images in an image frame sequence to synthesize the images in the image frame sequence to obtain a video segment.

It is understood that each frame of the plurality of frames of images in a video has a frame number (or serial number). Specifically, the terminal may encode a plurality of images in one image frame sequence in the order of the frame numbers of each image.

It should be noted that, with the video processing method provided in the embodiment of the present invention, a terminal can autonomously generate a plurality of video segments (for example, M video segments) from a complete target video, and the similarity between any two frames of images in each video segment is less than or equal to a preset threshold; without requiring the user to manually select multiple video clips of the target video. Wherein each of the plurality of video segments may serve as a summary of the target video. Therefore, the user control terminal can view the video clips, and the target video can be browsed quickly. Therefore, convenience of playing videos by the terminal can be improved.

In a possible implementation manner, in the video processing method provided by the embodiment of the present invention, the content type of each frame of image in each of the M image frame sequences is the same.

The two frames of images in the embodiment of the invention have the same content type, and the two frames of images comprise the same content type.

Optionally, the content type of one frame of image provided in the embodiment of the present invention may be a landscape type, a person type, an animal type, an indoor type, or an outdoor type. Illustratively, the content types of the two frames of images are both landscape types, and the content in the two frames of images is both landscape.

Specifically, the content types of the images in an image frame sequence are the same, that is, the content types of the images in the video segment integrated by the image frame sequence are the same.

It can be appreciated that a user typically requires that the content types of images in a video segment of a video be the same so that the video segment meets the user's requirements.

It should be noted that, in the video processing method provided in the embodiment of the present invention, since the content types of the images in each of the M image frame sequences of the target video obtained by the terminal are the same, the M video segments synthesized from the M image frame sequences are video segments meeting the user requirements. Therefore, the accuracy of the video clips of the target video obtained by the terminal can be improved, so that the accuracy of the M video clips as the abstract of the target video is higher, and the effect of the user control terminal on viewing the video clips and browsing the target video is better. Furthermore, the convenience of playing videos by the terminal can be further improved.

In a possible implementation manner, according to the video processing method provided by the embodiment of the present invention, a terminal may obtain M video frame sequences corresponding to a target video according to a certain rule. Specifically, S203 in the above embodiment may include S203a and S203 b:

s203a, the terminal divides the N frame image into a plurality of alternative image frame sequences according to the content type of the N frame image, the plurality of alternative image frame sequences belong to M groups, each group comprises at least one content type image frame sequence, and the content type corresponding to each group is different.

Specifically, the terminal may divide the N-frame image into a plurality of candidate image frame sequences according to a content type of the N-frame image in a preset classification manner. Wherein one of the predetermined grouping manners corresponds to a grouping of one image frame sequence.

The content type corresponding to each group is different, which means that the content types of the images in different image frame sequences in each group are different.

Optionally, the preset classification manner may be scene classification and location classification. The group corresponding to the scene classification can be recorded as a scene group, and the group corresponding to the place classification can be recorded as a place group.

For example, the content types corresponding to the scene groups may include a landscape type, a people type, and an animal type, and the content types corresponding to the place groups may include an indoor type and an outdoor type.

Specifically, it is assumed that M may be 2, and the M packets may include a packet 1 and a packet 2, where the packet 1 is a scene packet and the packet 2 is a location packet. The plurality of candidate image frame sequences may include an image frame sequence 1, an image frame sequence 2, an image frame sequence 3, an image frame sequence 4, and an image frame sequence 5. The group 1 includes an image frame sequence 1, an image frame sequence 2, and an image frame sequence 3, where the content type corresponding to the image frame sequence 1 is a landscape type, the content type corresponding to the image frame sequence 2 is a human object type, and the content type corresponding to the image frame sequence 3 is an animal type. The group 2 includes an image frame sequence 4 and an image frame sequence 5, where the content type corresponding to the image frame sequence 4 is an indoor type and the content type corresponding to the image frame sequence 5 is an outdoor type.

Of course, the content types of the images in each of the image frame sequences 1-5 are the same, for example, the content types of the images in the image frame sequence 1 are all landscape types.

S203b, the terminal acquires the image frame sequence with the largest frame number in each group to obtain M image frame sequences.

It will be appreciated that the greater the number of frames of images in a sequence of image frames, the more abundant the images the sequence of image frames contains. Users generally demand that images in a video segment of a video are rich, so that the effect of browsing the video through the video segment is better.

Illustratively, the image frame sequence with the largest number of frames among the image frame sequence 1, the image frame sequence 2, and the image frame sequence 3 included in the packet 1 is the image frame sequence 1; the image frame sequence 4 and the image frame sequence 5 included in the packet 2 have the largest number of frames and is the image frame sequence 4. At this time, the M image frame sequences in the above embodiment are the image frame sequence 1 and the image frame sequence 4.

It should be noted that, in the video processing method provided by the embodiment of the present invention, the terminal may determine, by using a plurality of candidate image frame sequences included in the M groups, an image frame sequence with the maximum frame number in each group, that is, an image frame sequence with relatively rich images in the plurality of candidate image frame sequences. Therefore, the accuracy of the M video clips of the target video obtained by the terminal can be improved, and the effect that a user browses the target video through the M video clips is better. Furthermore, the convenience of playing videos by the terminal can be further improved.

In a possible implementation manner, in the video processing method provided in the embodiment of the present invention, before the step S203b in the embodiment, the step S203c may further include:

s203c, for each candidate image frame sequence in the multiple candidate image frame sequences, when the picture similarity between the first image and the second image in one candidate image frame sequence is greater than a preset threshold, deleting the first image, where the first image and the second image are adjacent images of any two frames in one candidate image frame sequence.

Optionally, in one alternative image frame sequence, the frame sequence of the first image in the first image and the frame sequence of the first image in the second image is greater than that of the second image, or the frame sequence of the first image is smaller than that of the second image.

It can be understood that, when the similarity between the first image and the second image is greater than the preset threshold, it indicates that the similarity between the first image and the second image is high, i.e. the first image and the second image can be regarded as a repeated image.

It should be noted that, in the video processing method provided by the embodiment of the present invention, the terminal may delete one image (for example, the first image) of the repeated first image and the repeated second image in each candidate image frame sequence, so as to obtain a candidate image frame sequence with non-repeated images and relatively compact images. Therefore, the accuracy of the video clip of the target video obtained by the terminal can be improved, and the effect of browsing the video through the video clip by the user is better. Furthermore, the convenience of playing videos by the terminal can be further improved.

In a possible implementation manner, in the video processing method provided in the embodiment of the present invention, S203 in the above embodiment may include S203d and S203 e:

s203d, in response to a second input of the N-frame images by the user, the terminal determines an M-frame target image from the N-frame images.

In the method provided by the embodiment of the present invention, before S203d, the terminal may receive a second input from the user.

Optionally, the N frames of images may be displayed on a video interface of the terminal. At this time, the second input may be a user selection input of an M-frame target image among the N-frame images displayed on the video interface.

Similarly, the description of the manner of the second input may refer to the description of the first input in the above embodiment, and the description of the embodiment of the present invention is not repeated here.

S203e, the terminal acquires an image frame sequence corresponding to a frame of target image aiming at each frame of target image in the M frame of target image to obtain M image frame sequences, wherein each frame of image in the image frame sequence is an image which is not the frame of target image in the N frame of image and has the picture similarity with the frame of target image smaller than or equal to a preset threshold value.

It can be understood that, since the M frames of target images are selected by the user, that is, the M frames of target images are images meeting the user's requirements, and the image similarity between any two frames of images in the image frame sequence determined by the terminal according to each frame of target image is less than or equal to the preset threshold, that is, the images in the image frame sequence are not repeated. Therefore, the M image frame sequences obtained by the terminal are all image frame sequences meeting the requirements of users.

It should be noted that, with the video processing method provided in the embodiment of the present invention, the terminal may obtain M image frame sequences meeting the user requirements according to the M target images selected by the user. Therefore, the accuracy of the video clips of the target video obtained by the terminal is higher, namely the accuracy of M video clips as the abstract of the target video is higher, namely the effect of the user for controlling the terminal to view the video clips and browse the target video is higher. And then, be favorable to further improving the convenience that the terminal broadcast video.

In a possible implementation manner, in the video processing method provided by an embodiment of the present invention, S203e in the above embodiment may include S203e 'and S203 e':

s203 e', the terminal obtains a target image frame sequence corresponding to a frame of target image aiming at each frame of target image in the M frames of target images, so as to obtain M target image frame sequences.

S203 e', when the picture similarity between the third image and the fourth image in a sequence of target image frames is greater than the preset threshold, for each of the M sequences of target image frames, deleting the third image to obtain a sequence of image frames, so as to obtain M sequences of image frames, the third image and the fourth image being adjacent images of any two frames in the sequence of target image frames.

Optionally, a frame sequence of the third image and a frame sequence of the third image in the fourth image in one target image frame sequence is greater than that of the fourth image, or the frame sequence of the third image is smaller than that of the fourth image.

It can be understood that, when the similarity between the third image and the fourth image is greater than the preset threshold, it indicates that the similarity between the third image and the fourth image is high, i.e. the third image and the fourth image can be regarded as a repeated image.

It should be noted that, in the video processing method provided in the embodiment of the present invention, the terminal may delete one image (for example, the third image) of the third image and the fourth image that are repeated in each target image frame sequence, so as to obtain a target image frame sequence with no image repetition and relatively simple image. Therefore, the accuracy of the video clip of the target video obtained by the terminal can be improved, and the effect of browsing the video through the video clip by the user is better. Furthermore, the convenience of playing videos by the terminal can be further improved.

In a possible implementation manner, the video processing method provided by the embodiment of the present invention synthesizes images in an image frame sequence to obtain a video segment, and may implement, through S204 a:

s204a, in response to a third input from the user to the images in one of the M image frame sequences, the terminal synthesizes the images in the image frame sequence selected by the user to obtain a video clip.

In the method provided by the embodiment of the present invention, before S204a, the terminal may receive a third input from the user.

Optionally, images in each of the M image frame sequences may be displayed on a video interface of the terminal. At this time, the third input may be a user selection input of an image in one of the M image frame sequences displayed on the video interface.

Similarly, for the description of the manner of the third input, reference may be made to the description related to the first input in the foregoing embodiment, and details are not repeated herein in the embodiment of the present invention.

It should be noted that, in the video processing method provided in the embodiment of the present invention, after the terminal obtains the relatively simplified M image frame sequences corresponding to the target video, the terminal may further receive a selection input of the user for an image in each image frame sequence in the M image frame sequences. Therefore, the M image frame sequences selected and input by the user can better meet the requirements of the user, the accuracy of the M video clips of the target video obtained by the terminal can be further improved, and the effect of browsing the target video through the M video clips by the user is better.

In a possible implementation manner, the video processing method provided in the embodiment of the present invention may further include, after S204 provided in the above embodiment, S205:

s205, the terminal sends the target video and the corresponding relation to the server, wherein the corresponding relation is the corresponding relation between the target video and the M video clips.

Specifically, after the terminal uploads the target video to the server, the target video is not stored in the terminal.

It can be understood that, with the development of video coding technology and the increasing resolution of the display screen in the terminal, the size of the video is larger and larger, i.e. the storage space occupied in the terminal is larger.

Optionally, the correspondence between the target video and the M video segments may be an index relationship between the M video segments and the target video.

Optionally, one video segment itself may serve as an index of the target video, or one video segment may correspond to one index control, which may be used to index the video segment as the target video.

Specifically, when a video segment (denoted as video segment 1) is used as an index of the target video, the user can request the target video to be played in the form of streaming media from the server side on the terminal side by performing a fourth input on the video segment in the terminal.

Illustratively, the fourth input may be an input of the video segment 1 itself by the user, or an input of an index control corresponding to the video segment 1.

Similarly, the description of the manner of the fourth input may refer to the description of the first input in the foregoing embodiment, and the description of the embodiment of the present invention is not repeated here.

It should be noted that, with the video processing method provided in the embodiment of the present invention, the terminal may use the correspondence between the target video and the M video segments of the target video, so that the terminal may request the server to play the target video in the form of streaming media through the M video segments. Therefore, the storage space of the terminal can be saved while the terminal plays the target video.

In a possible implementation manner, as shown in fig. 3, a schematic diagram of a possible structure of a terminal 30 according to an embodiment of the present invention is provided. The terminal 30 shown in fig. 3 includes a receiving module 301, an obtaining module 302, and an integrating module 303; a receiving module 301, configured to receive a first input of a user, where the first input is used to trigger the terminal 30 to determine a target video; an obtaining module 302, configured to obtain N frames of images of a target video in response to a first input received by the receiving module 301, where N is a positive integer; acquiring M image frame sequences according to the N images, wherein the picture similarity between any two adjacent images in each image frame sequence in the M image frame sequences is less than or equal to a preset threshold value, M is a positive integer, and M is less than or equal to N; the integrating module 303 is configured to synthesize images in each of the M image frame sequences acquired by the acquiring module to obtain M video segments.

Optionally, the content type of each frame image in each of the M image frame sequences is the same.

Optionally, the obtaining module 302 is specifically configured to divide the N-frame image into a plurality of candidate image frame sequences according to content types of the N-frame image, where the plurality of candidate image frame sequences belong to M groups, each group includes an image frame sequence of at least one content type, and content types corresponding to each group are different; and acquiring the image frame sequence with the maximum frame number in each group to obtain M image frame sequences.

Optionally, the terminal 30 may further include: a deletion module; a deleting module, configured to delete, for each candidate image frame sequence in the multiple candidate image frame sequences, when a picture similarity between a first image and a second image in one candidate image frame sequence is greater than a preset threshold, a first image before the obtaining module 302 obtains the image frame sequence with the largest frame number in each group, where the first image and the second image are adjacent images of any two frames in the one candidate image frame sequence.

Optionally, the obtaining module 302 is specifically configured to determine, in response to a second input of the N frames of images by the user, M frames of target images from the N frames of images; and aiming at each frame of target image in the M frames of target images, acquiring an image frame sequence corresponding to one frame of target image to obtain M image frame sequences, wherein each frame of image in the image frame sequence is an image which is not the target image in the N frames of images and has the picture similarity with the target image in the frame of target image smaller than or equal to a preset threshold value.

Optionally, the obtaining module 302 is specifically configured to obtain, for each frame of target image in the M frames of target images, a target image frame sequence corresponding to one frame of target image, so as to obtain M target image frame sequences; and for each target image frame sequence in the M target image frame sequences, when the picture similarity between a third image and a fourth image in one target image frame sequence is greater than a preset threshold value, deleting the third image to obtain one image frame sequence so as to obtain the M image frame sequences, wherein the third image and the fourth image are adjacent images of any two frames in one target image frame sequence.

Optionally, the integrating module 303 is specifically configured to, in response to a third input of the user to the image in one of the M image frame sequences, synthesize the image in the image frame sequence selected by the user, so as to obtain a video segment.

Optionally, the terminal 30 may further include: an uploading module; and an uploading module, configured to synthesize, by the integrating module 303, images in each of the M image frame sequences acquired by the acquiring module 302, respectively, to obtain M video segments, and then send the target video and the corresponding relationship, which is the corresponding relationship between the target video and the M video segments, to the server.

It should be noted that, with the terminal provided in the embodiment of the present invention, the terminal may autonomously generate a plurality of video segments (for example, M video segments) from a complete target video, where a similarity between any two frames of images in each video segment is less than or equal to a preset threshold; without requiring the user to manually select multiple video clips of the target video. Wherein each of the plurality of video segments may be taken as a summary of the target video. Therefore, the user control terminal can view the video clip, and can quickly browse the target video. Therefore, convenience of playing videos by the terminal can be improved.

The terminal 30 provided in the embodiment of the present invention can implement each process implemented by the terminal in the foregoing method embodiments, and is not described here again to avoid repetition.

Fig. 4 is a schematic diagram of a hardware structure of a terminal according to an embodiment of the present invention, where the terminal 100 includes, but is not limited to: radio frequency unit 101, network module 102, audio output unit 103, input unit 104, sensor 105, display unit 106, user input unit 107, interface unit 108, memory 109, processor 110, and power supply 111. Those skilled in the art will appreciate that the terminal configuration shown in fig. 4 is not intended to be limiting, and that the terminal may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components. In the embodiment of the present invention, the terminal includes, but is not limited to, a mobile phone, a tablet computer, a notebook computer, a palm computer, a vehicle-mounted terminal, a wearable device, a pedometer, and the like.

The user input unit 107 is configured to receive a first input of a user, where the first input is used to trigger the terminal 100 to determine a target video; a processor 110, configured to, in response to the user input unit 107 and the received first input, acquire N frames of images of the target video, where N is a positive integer; acquiring M image frame sequences according to the N images, wherein the picture similarity between any two adjacent images in each image frame sequence in the M image frame sequences is less than or equal to a preset threshold value, M is a positive integer, and M is less than or equal to N; and respectively synthesizing the images in each image frame sequence in the M image frame sequences to obtain M video segments.

It should be noted that, with the terminal provided in the embodiment of the present invention, the terminal can autonomously generate a plurality of video segments (for example, M video segments) from a complete target video, and the similarity between any two frames of images in each video segment is less than or equal to a preset threshold; without requiring the user to manually select multiple video clips of the target video. Wherein each of the plurality of video segments may be taken as a summary of the target video. Therefore, the user control terminal can view the video clips, and the target video can be browsed quickly. Therefore, convenience of playing videos by the terminal can be improved.

It should be understood that, in the embodiment of the present invention, the radio frequency unit 101 may be used for receiving and sending signals during a message transmission or call process, and specifically, after receiving downlink data from a base station, the downlink data is processed by the processor 110; in addition, the uplink data is transmitted to the base station. Typically, radio frequency unit 101 includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a low noise amplifier, a duplexer, and the like. In addition, the radio frequency unit 101 can also communicate with a network and other devices through a wireless communication system.

The terminal provides wireless broadband internet access to the user through the network module 102, such as helping the user send and receive e-mails, browse web pages, access streaming media, and the like.

The audio output unit 103 may convert audio data received by the radio frequency unit 101 or the network module 102 or stored in the memory 109 into an audio signal and output as sound. Also, the audio output unit 103 may also provide audio output related to a specific function performed by the terminal 100 (e.g., a call signal reception sound, a message reception sound, etc.). The audio output unit 103 includes a speaker, a buzzer, a receiver, and the like.

The input unit 104 is used to receive an audio or video signal. The input Unit 104 may include a Graphics Processing Unit (GPU) 1041 and a microphone 1042, and the Graphics processor 1041 processes image data of a still picture or video obtained by an image capturing device (e.g., a camera) in a video capturing mode or an image capturing mode. The processed image frames may be displayed on the display unit 106. The image frames processed by the graphic processor 1041 may be stored in the memory 109 (or other storage medium) or transmitted via the radio frequency unit 101 or the network module 102. The microphone 1042 may receive sound and may be capable of processing such sound into audio data. The processed audio data may be converted into a format output transmittable to a mobile communication base station via the radio frequency unit 101 in case of a phone call mode.

The terminal 100 also includes at least one sensor 105, such as a light sensor, motion sensor, and other sensors. Specifically, the light sensor includes an ambient light sensor that can adjust the brightness of the display panel 1061 according to the brightness of ambient light, and a proximity sensor that can turn off the display panel 1061 and/or a backlight when the terminal 100 is moved to the ear. As one of the motion sensors, the accelerometer sensor can detect the magnitude of acceleration in each direction (generally three axes), detect the magnitude and direction of gravity when stationary, and can be used to identify the terminal posture (such as horizontal and vertical screen switching, related games, magnetometer posture calibration), vibration identification related functions (such as pedometer, tapping), and the like; the sensors 105 may also include fingerprint sensors, pressure sensors, iris sensors, molecular sensors, gyroscopes, barometers, hygrometers, thermometers, infrared sensors, etc., which are not described in detail herein.

The display unit 106 is used to display information input by a user or information provided to the user. The Display unit 106 may include a Display panel 1061, and the Display panel 1061 may be configured in the form of a Liquid Crystal Display (LCD), an Organic Light-Emitting Diode (OLED), or the like.

The user input unit 107 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the terminal. Specifically, the user input unit 107 includes a touch panel 1071 and other input devices 1072. Touch panel 1071, also referred to as a touch screen, may collect touch operations by a user on or near the touch panel 1071 (e.g., operations by a user on or near touch panel 1071 using a finger, stylus, or any suitable object or attachment). The touch panel 1071 may include two parts of a touch detection device and a touch controller. The touch detection device detects the touch direction of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch sensing device, converts the touch information into touch point coordinates, sends the touch point coordinates to the processor 110, and receives and executes commands sent by the processor 110. In addition, the touch panel 1071 may be implemented in various types, such as a resistive type, a capacitive type, an infrared ray, and a surface acoustic wave. In addition to the touch panel 1071, the user input unit 107 may include other input devices 1072. Specifically, other input devices 1072 may include, but are not limited to, a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a trackball, a mouse, and a joystick, which are not described in detail herein.

Further, the touch panel 1071 may be overlaid on the display panel 1061, and when the touch panel 1071 detects a touch operation thereon or nearby, the touch panel 1071 transmits the touch operation to the processor 110 to determine the type of the touch event, and then the processor 110 provides a corresponding visual output on the display panel 1061 according to the type of the touch event. Although in fig. 4, the touch panel 1071 and the display panel 1061 are two independent components to implement the input and output functions of the terminal, in some embodiments, the touch panel 1071 and the display panel 1061 may be integrated to implement the input and output functions of the terminal, which is not limited herein.

The interface unit 108 is an interface for connecting an external device to the terminal 100. For example, the external device may include a wired or wireless headset port, an external power supply (or battery charger) port, a wired or wireless data port, a memory card port, a port for connecting a device having an identification module, an audio input/output (I/O) port, a video I/O port, an earphone port, and the like. The interface unit 108 may be used to receive input (e.g., data information, power, etc.) from an external device and transmit the received input to one or more elements within the terminal 100 or may be used to transmit data between the terminal 100 and the external device.

The memory 109 may be used to store software programs as well as various data. The memory 109 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the cellular phone, etc. Further, the memory 109 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.

The processor 110 is a control center of the terminal, connects various parts of the entire terminal using various interfaces and lines, and performs various functions of the terminal and processes data by operating or executing software programs and/or modules stored in the memory 109 and calling data stored in the memory 109, thereby performing overall monitoring of the terminal. Processor 110 may include one or more processing units; preferably, the processor 110 may integrate an application processor, which mainly handles operating systems, user interfaces, application programs, etc., and a modem processor, which mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 110.

The terminal 100 may further include a power supply 111 (e.g., a battery) for supplying power to various components, and preferably, the power supply 111 may be logically connected to the processor 110 via a power management system, so as to manage charging, discharging, and power consumption management functions via the power management system.

In addition, the terminal 100 includes some functional modules that are not shown, and thus, the detailed description thereof is omitted.

Preferably, an embodiment of the present invention further provides a terminal, which includes a processor 110, a memory 109, and a computer program stored in the memory 109 and capable of running on the processor 110, where the computer program is executed by the processor 110 to implement the processes of the foregoing method embodiments, and can achieve the same technical effects, and details are not repeated here to avoid repetition.

The embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program implements the processes of the method embodiments, and can achieve the same technical effects, and in order to avoid repetition, the details are not repeated here. The computer-readable storage medium may be a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.

While the present invention has been described with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, which are illustrative and not restrictive, and it will be apparent to those skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope of the invention as defined in the appended claims.

Claims

1. a video processing method, is characterized in that, comprises:

receiving a first input from the user, where the first input is used to trigger the terminal to determine the target video;

In response to the first input, obtain N frames of images of the target video, where N is a positive integer;

According to the N frames of images, M image frame sequences are obtained, and the picture similarity between any two adjacent images in each of the M image frame sequences is less than or equal to a preset threshold, M is a positive integer, M is less than or equal to N;

Respectively synthesize images in each of the M image frame sequences to obtain M video clips;

Wherein, for each image frame sequence in the M image frame sequences, the content type of each image frame in one image frame sequence is the same

The acquisition of M image frame sequences according to the N frames of images includes:

According to the content types of the N frames of images, the N frames of images are divided into multiple candidate image frame sequences, the multiple candidate image frame sequences belong to M groups, and each group includes multiple content types. Image frame sequence, and the content type corresponding to each group is different;

Obtain the image frame sequence with the largest number of frames in each group to obtain the M image frame sequences.

2. The method according to claim 1, wherein before acquiring the image frame sequence with the largest number of frames in each group, the method further comprises:

For each candidate image frame sequence in the multiple candidate image frame sequences, when the picture similarity between the first image and the second image in one candidate image frame sequence is greater than the preset threshold, delete the The first image, the first image and the second image are any two adjacent images in the one candidate image frame sequence.

3. A terminal, comprising: a receiving module, an acquiring module and an integrating module;

The receiving module is configured to receive a first input from a user, where the first input is used to trigger the terminal to determine a target video;

The obtaining module is configured to obtain N frames of images of the target video in response to the first input received by the receiving module; according to the N frames of images, obtain M image frame sequences, the M images The picture similarity between any two adjacent images in the frame sequence of each image frame sequence is less than or equal to a preset threshold, N and M are both positive integers, and M is less than or equal to N;

The integration module is used for synthesizing the images in each of the M image frame sequences acquired by the acquiring module, respectively, to obtain M video clips;

Wherein, for each image frame sequence in the M image frame sequences, the content type of each frame image in one image frame sequence is the same;

The acquisition module is specifically configured to divide the N frames of images into multiple candidate image frame sequences according to the content type of the N frames of images, and the multiple candidate image frame sequences belong to M groups, each of which is The group includes image frame sequences of multiple content types, and each group corresponds to different content types; acquires the image frame sequence with the largest number of frames in each group to obtain the M image frame sequences.

4. The terminal according to claim 3, wherein the terminal further comprises: a deletion module;

The deletion module is used for, for each candidate image frame sequence in the plurality of candidate image frame sequences, before the acquisition module acquires the image frame sequence with the largest number of frames in each group, in one standby When the picture similarity between the first image and the second image in the selected image frame sequence is greater than the preset threshold, delete the first image, and the first image and the second image are the one candidate Any two adjacent images in a sequence of image frames.

5. A terminal, characterized in that, comprising a processor, a memory and a computer program stored on the memory and running on the processor, the computer program being implemented by the processor as claimed in the claims Steps of the video processing method described in 1 or 2.

6. A computer-readable storage medium, wherein a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the steps of the video processing method as claimed in claim 1 or 2 are realized .