CN114979741B

CN114979741B - Method, device, computer equipment and storage medium for playing video

Info

Publication number: CN114979741B
Application number: CN202110190796.1A
Authority: CN
Inventors: 刘艳峰
Original assignee: Tencent Technology Beijing Co Ltd
Current assignee: Tencent Technology Beijing Co Ltd
Priority date: 2021-02-20
Filing date: 2021-02-20
Publication date: 2024-08-06
Anticipated expiration: 2041-02-20
Also published as: CN114979741A

Abstract

The application relates to a method, a device, a computer device and a storage medium for playing video. The method relates to an image segmentation technology and an image recognition technology in the field of computer vision, and comprises the following steps: displaying a video playing interface; displaying a moving picture on the video playing interface; displaying an commentator portrait at a first location of the moving picture; wherein the first location matches an commentary object of the commentator. By adopting the method, the efficiency of the user for acquiring information from the video picture can be improved.

Description

Method, device, computer equipment and storage medium for playing video

Technical Field

The present application relates to the field of computer technologies, and in particular, to a method, an apparatus, a computer device, and a storage medium for playing video.

Background

With the development of computer technology and internet technology, video becomes one of important channels for people to acquire information, and the types of video are also becoming more and more diverse. In recent years, for some videos presenting specific content, in order to increase understanding of the content presented by the video by a user, a comment voice is often configured for the video, so as to facilitate the user to better understand the video content, such as a sports commentary and a competitive game comment, etc.

However, in the manner of video interpretation, the speech interpretation and the video frames are usually transmitted to the user, and in the case that the relation between the speech interpretation and the video frames is not tight enough, the ability of the video to transmit information to the user is limited, for example, when the speech interpretation is too fast, the user cannot obtain information about the video frames from the speech interpretation, and only can understand the video content from the video frames, which results in low efficiency of the user to obtain information from the video.

Disclosure of Invention

In view of the foregoing, it is desirable to provide a method, apparatus, computer device, and storage medium that can improve the efficiency with which a user can acquire information from a video picture.

A method of playing video, the method comprising:

Displaying a video playing interface;

displaying a moving picture on a video playing interface;

Displaying an commentator portrait at a first position of the moving picture;

wherein the first location matches an commentary object of the commentator.

In one embodiment, the method further comprises:

When the commentary object moves out of the moving picture, the display of the commentary portrait at the first location in the moving picture that matches the commentary object is cancelled.

In one embodiment, the method further comprises:

when the other object is moved in at the first position in the moving picture matched with the commentary object, the display of the commentary portrait at the first position in the moving picture matched with the commentary object is canceled.

In one embodiment, the method further comprises:

When the commentary object matched with the commentary content about the moving picture collected with the commentary picture changes from the first commentary object to the second commentary object, and the second commentary object is not included in the moving picture, the display of the commentary portrait at the first position in the moving picture matched with the first commentary object is canceled.

In one embodiment, the method further comprises:

When an commentary object matched with commentary content related to a moving picture collected with the commentary picture changes from a first commentary object to a second commentary object in the moving picture, a commentary portrait of the commentary picture collected in synchronization with the moving picture is displayed at a first position in the moving picture matched with the second commentary object.

In one embodiment, displaying an commentator figure of a commentator screen acquired in synchronization with a moving picture at a first location in the moving picture that matches a second commentator object, includes:

When the distance between the second comment object and the first comment object in the moving picture is smaller than the threshold value, displaying a comment person image gradually moving from the first position matched with the first comment object to the first position matched with the second comment object in the moving picture.

When the distance between the second comment object and the first comment object in the moving picture is greater than or equal to a threshold value, canceling the display of the comment person image at the first position in the moving picture matched with the first comment object;

At a first location in the moving picture that matches the second commentary object, a commentary portrait is displayed.

In one embodiment, the method further comprises:

when the commentary object is moved in the moving picture, a commentary portrait moving in the moving picture is displayed at a first position that matches the commentary object moving in the moving picture.

In one embodiment, the method further comprises:

When the acquisition view angle of the moving picture deviates, the commentator portrait of the commentator picture acquired synchronously with the moving picture is displayed at a first position which is matched with the commentator object in the moving picture and gradually deviates along with the deviation of the acquisition view angle.

In one embodiment, the method further comprises:

When the size of the commentary object in the moving picture changes, a commentary figure which changes the size synchronously with the change of the size of the commentary object is displayed at a first position matched with the commentary object in the moving picture.

In one embodiment, the method further comprises:

In the moving picture, an object identification corresponding to the object is displayed in an area adjacent to the object in the moving picture.

In one embodiment, the method further comprises:

Acquiring commentary content about a moving picture acquired along with a commentator picture;

a commentary object matching the commentary content is determined in the synchronously acquired moving pictures.

In one embodiment, determining a commentary object matching the commentary content in a synchronously acquired moving picture comprises:

semantic recognition is carried out on the comment content, and object identification in the comment content is obtained;

Determining object identifiers of all objects in the synchronously acquired moving pictures;

the narrative object is determined from the moving picture based on the object identification in the narrative content and the object identification of each object in the moving picture.

In one embodiment, the method further comprises:

Acquiring an commentator picture acquired synchronously with the moving picture;

Image segmentation is carried out on the commentator picture, and a segmentation result corresponding to the commentator picture is obtained;

And dividing the region where the commentator is located from the commentator picture according to the dividing result to obtain the commentator portrait.

In one embodiment, the method further comprises:

acquiring the picture size and the real size of the explanation object;

Determining a ratio between a picture size and a real size of the commentary object;

a picture size suitable for the human figure of the commentator displayed in the moving picture is obtained in terms of the ratio and the real size of the commentator in the commentator picture.

In one embodiment, the method further comprises:

and embedding the commentator portrait obtained from the commentator picture into the moving picture according to a first position matched with the commentator object in the moving picture to obtain a fusion picture comprising the commentator portrait.

In one embodiment of the present invention, in one embodiment,

Displaying a moving picture on a video playing interface, comprising:

displaying a moving picture collected from a sports competition on a video playing interface;

displaying an commentator portrait at a first location of a moving picture, comprising:

In the moving picture, at a first position matched with a target player in the moving picture, an commentator figure of a race commentator picture acquired in synchronization with the moving picture is displayed.

An apparatus for playing video, the apparatus comprising:

the display module is used for displaying a video playing interface;

the display module is also used for displaying the moving picture on the video playing interface;

The display module is also used for displaying the human image of the commentator at the first position of the moving picture; wherein the first location matches an commentary object of the commentator.

A computer device comprising a memory storing a computer program and a processor which when executing the computer program performs the steps of:

Displaying a video playing interface;

displaying a moving picture on a video playing interface;

Displaying an commentator portrait at a first position of the moving picture;

wherein the first location matches an commentary object of the commentator.

A computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of:

Displaying a video playing interface;

displaying a moving picture on a video playing interface;

Displaying an commentator portrait at a first position of the moving picture;

wherein the first location matches an commentary object of the commentator.

A computer program comprising computer instructions stored in a computer readable storage medium, the computer instructions being read from the computer readable storage medium by a processor of a computer device, the computer instructions being executed by the processor to cause the computer device to perform the steps of the method of playing video described above.

According to the method, the device, the computer equipment and the storage medium for playing the video, the commentator figure is displayed at the first position of the moving picture, and because the first position is matched with the commentary object of the commentator, when a user watches the video, the user can quickly know the current commentary object of the commentator by combining the position of the commentator figure, and the user can quickly understand the video content by combining the commentator figure, so that the efficiency of obtaining the video information when the user watches the video is improved.

Drawings

FIG. 1 is an application environment diagram of a method of playing video in one embodiment;

FIG. 2 is a flow chart of a method of playing video in one embodiment;

FIG. 3 (a) is a schematic diagram of capturing live fragments in one embodiment;

FIG. 3 (b) is a schematic diagram of capturing an commentator live fragment in one embodiment;

FIG. 4 is a block flow diagram of image segmentation of an commentator screen in one embodiment;

FIG. 5 is a flow diagram of determining an illustration object from a moving picture in one embodiment;

FIG. 6 (a) is a schematic diagram of an announcer portrait embedding location in one embodiment;

FIG. 6 (b) is a schematic diagram of an announcer portrait embedding location in another embodiment;

FIG. 6 (c) is a schematic diagram of an announcer portrait embedding location in yet another embodiment;

FIG. 7 (a) is a schematic diagram of an announcer portrait embedding location in yet another embodiment;

FIG. 7 (b) is a schematic diagram of an announcer portrait embedding location in yet another embodiment;

FIG. 8 is a flow diagram of identifying regions in a moving picture where objects are located, according to one embodiment;

FIG. 9 (a) is a schematic diagram of a cancel display of an announcer portrait in one embodiment;

FIG. 9 (b) is a schematic diagram showing a person figure of an commentator being omitted from display in another embodiment;

Fig. 10 (a) is a schematic diagram of switching an illustration object in one embodiment;

fig. 10 (b) is a schematic diagram of switching an illustration object in another embodiment;

FIG. 11 is a schematic diagram of an embedded position of a mobile commentator figure in one embodiment;

FIG. 12 is a schematic diagram of an embedded position of a moving commentator figure in another embodiment;

FIG. 13 is a schematic diagram of a mobile commentator figure size change in one embodiment;

FIG. 14 is a schematic diagram of an object identifier corresponding to a display object in one embodiment;

FIG. 15 is a flow diagram of a method of playing video in one embodiment;

FIG. 16 is a flowchart of a method for playing video according to another embodiment;

FIG. 17 is a flow chart of a method of playing video in another embodiment;

FIG. 18 is a block diagram of an apparatus for playing video in one embodiment;

fig. 19 is an internal structural view of the computer device in one embodiment.

Detailed Description

The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.

The embodiment of the application provides a method for playing video, which relates to the technology of artificial intelligence (ARTIFICIAL INTELLIGENCE, AI), wherein the artificial intelligence is the theory, method, technology and application system which simulate, extend and expand human intelligence by using a digital computer or a machine controlled by the digital computer, sense environment, acquire knowledge and acquire an optimal result by using the knowledge. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.

The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.

The embodiment of the application provides a video playing method, which mainly relates to an artificial intelligence Computer Vision (CV). The computer vision is a science for researching how to make a machine "see", and more specifically, a camera and a computer are used to replace human eyes to identify, track and measure targets, and the like, and further, graphic processing is performed, so that the computer is processed into images which are more suitable for human eyes to observe or transmit to an instrument to detect. As a scientific discipline, computer vision research-related theory and technology has attempted to build artificial intelligence systems that can acquire information from images or multidimensional data. Computer vision techniques typically include image processing, image segmentation, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D techniques, virtual reality, augmented reality, synchronous positioning, map construction, etc., as well as common biometric recognition techniques such as face recognition, fingerprint recognition, etc.

The embodiment of the application provides a video playing method, which mainly relates to an image segmentation technology and an image recognition technology in the technical field of computer vision. An image is made up of a number of pixels, and image segmentation is the segmentation of pixels in an image according to the different semantics of the pixels expressed in the image. For example, in the embodiment of the present application, an image of an commentator picture acquired synchronously with a moving picture is divided to obtain a division result corresponding to the commentator picture, and an area where the commentator is located is divided from the commentator picture according to the division result to obtain the commentator portrait. The image recognition is to recognize the region where the object is located, the category of the object, and the like from the image. For example, in the embodiment of the present application, image recognition is performed on a moving picture to obtain an area in which each object is located in the moving picture, thereby determining a blank area in the moving picture to determine an embedding position of a person image applicable to a commentator in the moving picture from the blank area.

The embodiment of the application provides a video playing method, which also relates to artificial intelligence natural language processing (Nature Language processing, NLP). Natural language processing is an important direction in the fields of computer science and artificial intelligence. It is studying various theories and methods that enable effective communication between a person and a computer in natural language. Natural language processing is a science that integrates linguistics, computer science, and mathematics. Thus, the research in this field will involve natural language, i.e. language that people use daily, so it has a close relationship with the research in linguistics. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic questions and answers, knowledge graph techniques, and the like.

The embodiment of the application provides a video playing method, which mainly relates to a semantic understanding technology in the technical field of natural language processing. For example, in the embodiment of the present application, semantic recognition is performed on the comment content about the moving picture collected with the comment picture, and the object identification in the comment content is obtained, so that the comment object is determined from the moving picture according to the object identification in the comment content and the object identification of each object in the moving picture collected synchronously, and the comment portrait is embedded in the first position in the blank area of the moving picture, which is matched with the comment object.

The method for playing video provided by the application can be applied to an application environment shown in fig. 1. Wherein the playback terminal 102 communicates with the server 104 via a network. The playback terminal 102 may display a video playback interface; displaying a moving picture on a video playing interface; displaying an commentator portrait at a first position of the moving picture; wherein the first location matches an commentary object of the commentator. The server 104 may acquire a moving picture acquired from live action, transmit the moving picture to the playback terminal 102, and cause the playback terminal 102 to display the moving picture; the server 104 may also acquire a moving picture acquired from live action and a commentary picture acquired in synchronization with the moving picture, determine a commentary object matching the commentary content in the moving picture based on the commentary content about the moving picture acquired with the commentary picture, embed a commentary portrait acquired from the commentary picture in the moving picture at a first position matching the commentary object in the moving picture, obtain a fused picture including the commentary portrait, and send the fused picture to the playback terminal 102 such that the playback terminal 102 displays the commentary portrait at the first position matching the commentary object in the moving picture.

In other embodiments, the playback terminal 102 may independently complete the composition and display of the video. Specifically, the playback terminal 102 can acquire and display a moving picture acquired from live action; the playback terminal 102 may also acquire a moving picture acquired from live action and a commentary picture acquired in synchronization with the moving picture, determine a commentary object matching the commentary content in the moving picture from the commentary content about the moving picture acquired with the commentary picture, and display a commentary portrait at a first position matching the commentary object in the moving picture.

The playing terminal 102 may be, but not limited to, various personal computers, notebook computers, smartphones, tablet computers, portable wearable devices, televisions, etc., and the server 104 may be implemented by a stand-alone server or a server cluster formed by a plurality of servers.

The implementation main body of the method for playing video provided by the embodiment of the application can be the device for playing video provided by the embodiment of the application or the computer equipment integrated with the device for playing video, wherein the device for playing video can be realized in a hardware or software mode. The computer device may be the playback terminal 102 or the server 104 shown in fig. 1.

The method for playing the video, provided by the embodiment of the application, can be applied to a live broadcast scene, an explanation video synthesis scene, an event broadcasting scene, a remote conference scene, a remote teaching scene and the like. The method for playing the video provided by the embodiment of the application can be suitable for television program production, internet program production and the like.

For example, in a live scene, a live presenter's commentary voice for a live event is typically added to the live presentation, while the live presentation switches between the event live and the presenter's live room. For example, in the live broadcast of a football match, a host's explanation voice of the match is added to the live broadcast, and the live broadcast picture is switched between the match site and the host's live broadcast room. By the method of the embodiment of the application, the host image can be embedded into the competition field picture, and the position of the host image in the competition field picture is related to the explanation object of the host on the competition field picture.

In one embodiment, a live segment of the game and a live segment of the commentator recorded synchronously with the live segment of the game are obtained, one frame of live view of the game is obtained from the live segment of the game, a presenter view synchronously collected with the live view of the game is obtained from the live segment of the commentator, a presenter view is segmented from the presenter view, the presenter view is embedded in the live view of the game, and the position of the presenter view in the live view of the game is related to the commentary content of the presenter on the live view of the game. In this way, the host figures can be naturally and smoothly integrated into the live broadcast segment of the match, and the live broadcast watching user can quickly know the player currently illustrated by the host in combination with the position of the host figures in each frame of the live broadcast picture, and quickly understand the live broadcast content in combination with the host figures in each frame of the live broadcast picture, so that the efficiency of the live broadcast watching user in acquiring live broadcast information when watching the live broadcast is improved; and the live broadcast picture is prevented from being displayed in a switching manner between the live broadcast room of the game and the live broadcast room of the host, and the live broadcast watching user is prevented from missing the content of the game, so that the comprehensiveness of acquiring live broadcast information when the live broadcast watching user information is watched is improved.

For another example, in a live scene, a tour guide or host typically introduces a location being browsed during live broadcast, for example, when protecting animals or dangerous animals that cannot be approached in a zoo are being introduced, the view of the user typically switches between a tour guide live view and an animal live view. According to the method provided by the embodiment of the application, the synchronously recorded live guide picture and the live animal picture can be obtained, the live guide picture is obtained by dividing the live guide picture, the live guide picture is embedded into the live animal picture, and the position of the live guide picture in the live animal picture is related to the explanation object of the live guide.

For another example, in a video composition scene, a user usually adds a voice to a recorded video, and the video composition user typically only displays a video frame of the recorded video, or switches the display between a video frame and a commentator frame. For example, a video composition user adds a comment to a recorded game video, and the composed video typically only shows a game picture. By the method of the embodiment of the application, the commentary portrait can be embedded into the game picture, and the position of the commentary portrait in the game picture is related to the commentary object of the commentary portrait on the game picture.

In one embodiment, a recorded game video and a comment video synchronous with the progress of the game video are obtained, one frame of game picture is obtained from the recorded game video, a comment picture synchronously collected with the game picture is obtained from the comment video, a comment figure is obtained by dividing the comment picture, the comment figure is embedded into the game picture, and the position of the comment figure in the game picture is related to a comment object of the comment person on the game picture. Therefore, the commentary portrait can be naturally and smoothly integrated into the game video, and the video watching user can quickly learn the currently commented game object by combining the position of the commentary in each frame of game picture and quickly understand the game operation by combining the expression, limb language and the like of the commentary, so that the efficiency of the video watching user in obtaining video information when watching the video is improved.

For another example, in a remote lecture scenario, a teacher typically presents and explains courseware remotely. By the method of the embodiment of the application, the teacher portrait can be embedded into the courseware picture, and the position of the teacher portrait in the courseware picture is related to the explanation object of the teacher to the courseware picture.

In one embodiment, a live courseware segment and a live teacher segment synchronously acquired with the live courseware segment are acquired, one frame of courseware picture is acquired from the live courseware segment, a teacher picture synchronously acquired with the courseware picture is acquired from the live courseware segment, a teacher figure is obtained by dividing the teacher picture, the teacher figure is embedded into the courseware picture, and the position of the teacher figure in the courseware picture is related to the explanation object of the teacher to the courseware picture. Thus, the teacher portrait can be naturally and smoothly fused into the courseware live broadcast segments, the fused courseware live broadcast segments are displayed to students, the students can quickly learn the courseware key points of the current explanation by combining the positions of the teacher in each frame of courseware picture, and quickly understand the courseware content by combining the expression, limb language and the like of the teacher, so that the remote learning efficiency of the students is improved.

In one embodiment, as shown in fig. 2, a method for playing video is provided, and this embodiment is mainly applied to a computer device (the playing terminal 102 or the server 104 in fig. 1) for illustration, and includes the following steps:

step S202, displaying a video playing interface.

The video playing interface is an interface for playing video. The video consists of a plurality of continuous video pictures, and the video is played, namely the continuous video pictures are displayed in sequence.

In one embodiment, the playback terminal displays a video playback interface in response to a user operation. The video playing interface may be provided by the playing terminal, for example, in response to a user operation, the playing terminal is turned on, i.e. the video playing interface of the playing terminal is displayed. The video playing interface may also be provided by a browser, for example, the playing terminal starts the browser installed thereon to enter the video playing interface of the web video player in response to a user operation. The video playing interface may also be provided by a video playing client installed on the playing terminal, for example, the playing terminal responds to a user operation to start the video playing client installed thereon and enter the video playing interface of the video playing client. The video playing client is an application with a video playing function, and can be an application which specially provides video services, or an application which specially provides other services and simultaneously provides video services, such as a social application and the like.

In step S204, a moving picture is displayed on the video playing interface.

In one embodiment, the live event may be an event in a real scene, such as an athletic game, a news drawing scene, a documentary viewing scene, a performance scene, a program recording scene, and so forth. The recording terminal can acquire continuous image sequences from the real scene through a built-in or externally connected camera to obtain multi-frame moving pictures. In other embodiments, the live action may also be content presented on a presentation screen of the terminal screen, such as an electronic game, a presentation, and so on. The recording terminal can record the display picture of the terminal screen through a built-in or externally connected camera to form a video stream, wherein the video stream comprises multiple frames of moving pictures.

In one embodiment, the playing terminal plays live broadcast segments acquired from live activities at a video playing interface, thereby displaying one frame of moving pictures of the live broadcast segments. The live segment is a part of video stream of a live resource, and specifically may be a video stream of a specified duration, such as a 15 second video stream, a 30 second video stream, and the like. The recording terminal collects continuous image sequences through a built-in or externally connected camera to form a live broadcast segment, the live broadcast segment is uploaded to the server, the server processes each moving picture in the live broadcast segment, the processed live broadcast segment is pushed to the playing terminal, so that a communication link is formed among the recording terminal, the server and the playing terminal, and the playing terminal pulls the live broadcast segment generated in real time from the server to play.

In other embodiments, the playing terminal plays the recorded live video acquired from the live event on the video playing interface, thereby displaying one frame of moving pictures of the live video. The playing terminal can acquire the field video stored by the local machine for playing; the playing terminal can also download the field video from the network for playing; the playing terminal can also acquire the live video transmitted by other computer equipment for playing, for example, the playing terminal acquires the live video transmitted by the server for playing.

Step S206, displaying the human figure of the commentator at the first position of the moving picture; wherein the first location matches an commentary object of the commentator.

Among them, the illustration object may be a living body, an object, a virtual character, a picture, or the like in a moving picture. For example, an athlete, referee or goal in a sports competition screen, etc. may be presented. The first position is a position of a commentator figure in the moving picture, and the second position is a position of a commentary object in the moving picture.

In one embodiment, the commentator portrait may be a pre-set image with a prompt function. In other embodiments, the commentator figures may be derived from a commentator picture matting acquired in synchronization with the moving picture.

Embodiments of the present application relate to synchronous acquisition of moving pictures and commentators, and for moving pictures and commentators acquired synchronously, commentary acquired along with the commentators is used to describe the moving pictures. The live video (or live segment) of the moving picture and the commentator video (or commentator live segment) of the commentator picture may be synchronized in progress, whereby the commentary collected with the commentator picture may be used to describe the moving picture. For example, a moving picture of a live video between 30 seconds and 50 seconds shows a shooting action of a player, and an explanatory content of an explanatory video between 30 seconds and 50 seconds explains the shooting action of the player. To ensure synchronous acquisition of moving pictures and commentator pictures, embodiments of the present application may acquire commentator pictures in the following manner:

The method for playing the video, provided by the embodiment of the application, can be used for embedding the commentator portrait into the live broadcast segment. When processing live segments, the commentator screen may be acquired by: ① The live broadcast method comprises the steps that when a live broadcast section is recorded, a live broadcast section of the commentator is synchronously recorded according to the same frame rate as that of the live broadcast section, so that the live broadcast section and the live broadcast section of the commentator are synchronously recorded, and a one-to-one synchronous acquisition relationship exists between a moving picture in the live broadcast section and a picture of the commentator in the live broadcast section of the commentator. ② The method comprises the steps that a live broadcast section collected from live activities is watched by an commentator, and live broadcast sections of the commentator on the live activities are synchronously recorded according to the same frame rate as that of the live broadcast section when the live broadcast section is watched by the commentator, so that the live broadcast section and the live broadcast section of the commentator which are kept synchronous in progress are obtained, and a one-to-one synchronous collection relation exists between a moving picture in the live broadcast section and a picture of the commentator in the live broadcast section of the commentator.

The method for playing the video provided by the embodiment of the application can also be used for embedding the recorded video into the commentator portrait. When processing recorded video, the commentator pictures may be acquired by: and synchronously recording the live video of the commentator to the live activity according to the same frame rate as the live video when the commentator watches the live video, thereby obtaining the live video and the commentator video which are kept synchronous in progress, wherein the moving pictures in the live video and the commentator pictures in the commentator video have a one-to-one corresponding synchronous acquisition relationship.

For example, referring to fig. 3 (a), fig. 3 (a) is a schematic diagram of capturing a live video clip in one embodiment, it can be seen that the live video clip is obtained by capturing a live video clip by a capturing device at a specified frame rate (e.g., a target frame rate); referring to fig. 3 (b), fig. 3 (b) is a schematic diagram of capturing live segments of an commentator in one embodiment, it can be seen that the commentator views live segments captured from a live event, and simultaneously records live segments of the commentator to the live event at the same frame rate as the live segments are captured. In this way, the collected commentator live broadcast segments and the live broadcast segments are kept synchronous in progress, and the moving pictures in the live broadcast segments and the commentator pictures in the commentator live broadcast segments have a one-to-one synchronous collection relationship.

In one embodiment, a playing terminal plays a live broadcast segment acquired from a live event at a video playing interface, displays an commentator figure of an commentator picture acquired synchronously with the live event at a first position of the live event picture of the live broadcast segment, and the first position is matched with an commentator object of the commentator. In other embodiments, the playing terminal plays the recorded live video acquired from the live action on the video playing interface, displays the commentator portrait of the commentator picture acquired synchronously with the live action at the first position of the live picture of the live video, and the first position is matched with the commentator object of the commentator.

In one embodiment, a computer device obtains an commentator screen acquired in synchronization with a moving screen; image segmentation is carried out on the commentator picture, and a segmentation result corresponding to the commentator picture is obtained; and dividing the region where the commentator is located from the commentator picture according to the dividing result to obtain the commentator portrait.

In one embodiment, the computer device may employ a generic image segmentation strategy for image segmentation of the commentator screen. The image is composed of pixels, and the image segmentation is to segment the pixels in the image according to different semantics expressed by the pixels in the image, for example, in some scenes, a foreground or a background needs to be segmented from the image, and in other scenes, a specific object needs to be segmented from the image, or the image is segmented into different areas according to different semantics, which belong to the category of image segmentation. The image segmentation strategy can be a trained image segmentation model or a general image segmentation algorithm, etc. The image segmentation model can learn through the sample image of the specific scene, so that the image segmentation model has the capability of carrying out image segmentation on the sample image of the specific scene.

Taking an image segmentation model as an example, the input end of the image segmentation model is an commentator picture, and the output end of the image segmentation model is a segmentation result corresponding to the commentator picture. For example, referring to fig. 4, fig. 4 is a block flow diagram of image segmentation of an announcer screen in one embodiment. It can be seen that the image segmentation model is input into the commentator picture, the foreground and background segmentation is performed by using the image segmentation model, the segmentation result corresponding to the commentator picture is obtained, the segmentation result can clearly reflect the region where the commentator is located, and the region where the commentator is located is segmented from the commentator picture, so as to obtain the commentator portrait.

In this embodiment, the image of the commentator screen is divided to accurately divide the commentator image from the commentator screen.

In one embodiment, a computer device obtains commentary about a moving picture acquired with a commentator picture; a commentary object matching the commentary content is determined in the synchronously acquired moving pictures.

In one embodiment, the computer device may obtain a preset duration of the comment voices collected along with the comment member screen, determine a moving screen covered by the preset duration of the comment voices, and determine a comment object in the moving screen covered by the preset duration of the comment voices according to a comment content corresponding to the preset duration of the comment voices. In other embodiments, the computer device may also obtain a preset number of moving pictures, and determine the comment objects in the preset number of moving pictures according to the comment voices covered by the preset number of moving pictures.

In one embodiment, a computer device obtains a comment voice about a moving picture collected with a commentator picture, performs text conversion on the comment voice to obtain a comment text corresponding to the comment voice, and determines comment content according to the comment text. Alternatively, the comment text may be regarded as comment content.

In one embodiment, determining a commentary object matching the commentary content in a synchronously acquired moving picture comprises: semantic recognition is carried out on the comment content, and object identification in the comment content is obtained; determining object identifiers of all objects in the synchronously acquired moving pictures; the narrative object is determined from the moving picture based on the object identification in the narrative content and the object identification of each object in the moving picture.

The object identifier is identity information for characterizing the object, and is used for uniquely identifying the object, and may specifically be the name, number and the like of the object.

In one embodiment, the computer device may employ a generic semantic recognition policy to semantically recognize the narrative content to obtain an object identification in the narrative content. The semantic recognition strategy can be a trained semantic recognition model or a general semantic recognition algorithm. The semantic recognition model can learn through sample texts of specific scenes, so that the semantic recognition model has the capability of carrying out semantic recognition on the sample texts of the specific scenes.

In one embodiment, a computer device may identify facial image features of objects in a moving picture, determine object identifications of objects in the moving picture. The face image features are data for reflecting the face features in the image. Facial features are physiological features inherent to a face, such as the iris, positional relationship between facial organs (eyes, nose, mouth, ears, etc.), structure of facial organs (shape, size, etc.), skin texture, etc. Alternatively, the face image feature may specifically be one or a combination of several of position information, texture information, shape information, color information, and the like related to the face feature extracted from the face image. Taking the positional information as an example, the positional information may refer to a distance, an angle, etc. between respective facial organs such as eyes, nose, mouth, ears, etc.

In other embodiments, the computer device may identify identity image features of individual objects in the moving picture, and determine object identifications of individual objects in the moving picture. An identity image feature is data that is used to reflect the identity feature in an image. Taking a moving picture of a sports competition as an example, a player wears a ball cover on the body of the player, and the identity information of the player can be determined by identifying the number of the player on the ball cover.

In one embodiment, a computer device obtains an intersection between object identifications in the narrative and object identifications of respective objects in a moving picture, and determines the narrative object from the moving picture based on the intersection.

In one embodiment, a computer device obtains text vectors corresponding to object identifiers in the commentary, obtains text vectors corresponding to respective object identifiers in the moving picture, and determines object identifiers of the commentary objects according to similarities between the text vectors corresponding to the object identifiers in the commentary and the text vectors corresponding to respective object identifiers in the moving picture, thereby determining the commentary objects from the moving picture. Where the text vector is a feature vector reflecting the text characteristics of the object identification itself.

For example, referring to fig. 5, fig. 5 is a flow diagram of determining an illustration object from a moving picture in one embodiment. It can be seen that the computer device semantically identifies the narrative content, obtains the object identification in the narrative content, determines the object identification of each object in the moving picture, and determines the narrative object 502 from the moving picture based on the object identification in the narrative content and the object identification of each object in the moving picture.

In this embodiment, a comment object matching the comment content is determined in the moving picture to embed a comment person image in the moving picture at a first position matching the comment object.

In one embodiment, a computer device embeds an commentator portrait obtained from a commentator screen into a moving picture at a first location that matches a commentator object in the moving picture, resulting in a fused picture that includes the commentator portrait.

In one embodiment, when determining one comment object according to the comment content, the computer device may determine a position in a specified range from the moving picture where a distance from the position of the comment object is located, obtain a candidate region corresponding to the comment object, and select an embedding position applicable to the person figure of the comment from the candidate region. The distance here may be a screen distance or a real distance obtained by converting the screen distance. The picture distance can be characterized by the number of pixels.

For example, the position of the picture distance between the position of the commentary object and the position of the commentary object is determined within [5,10] from the moving picture, and the candidate region corresponding to the commentary object is obtained, namely, the candidate region is constructed according to the position of the picture distance D between the position of the commentary object and the position of the commentary object, wherein D is more than or equal to 5 pixels and less than or equal to 10 pixels. Referring to fig. 6 (a), fig. 6 (a) is a schematic diagram illustrating a person-image embedding position in one embodiment. It can be seen that, according to the positions of the distances between the positions of the commentary objects 602 within the specified range, candidate regions 604 corresponding to the commentary objects are obtained, and the embedding positions suitable for the commentary portrait on the moving picture are selected from the candidate regions 604.

In one embodiment, the computer device may obtain a picture size of the commentator portrait, and determine an embedding location for the commentator portrait on the moving picture from the candidate areas based on the picture size of the commentator portrait. For a specific implementation of obtaining the frame size of the human image of the commentator, refer to the following embodiments, and are not described herein. It will be appreciated that the computer device selects a location from the candidate area where the commentator figure can be placed as an embedded location for the commentator figure. For example, referring to fig. 6 (b), fig. 6 (b) is a schematic diagram illustrating a person-image embedding position in another embodiment. It can be seen that a position from the candidate region 604 where the commentator portrait 606 can be dropped is selected as an embedded position of the commentator portrait 606 in the moving picture.

In another embodiment, the computer device may select, from the candidate region, a position where a sum of distances from other objects within a preset range is the largest and a distance from the comment object is the smallest as an embedding position suitable for the comment image in the moving picture, wherein the preset range covers the candidate region. For example, referring to fig. 6 (c), fig. 6 (c) is a schematic diagram of a human figure embedding position of an interpreter in yet another embodiment. It can be seen that, as the embedding position suitable for the commentator portrait 606, a position having the largest sum of distances from other objects within the preset range 608 and the smallest distance from the commentator object is selected from the candidate region 604, the distance between the commentator portrait 606 and the commentator object 602 is significantly smaller than the distance between the commentator portrait 606 and other objects within the preset range 608, and the embedding position of the commentator portrait 606 has significant directivity.

In yet another embodiment, the computer device may select, according to a preset azimuth priority, a position corresponding to an azimuth with a highest priority from the candidate area as an embedding position applicable to the commentator portrait on the moving picture. For example, if the preset azimuth priority is top of screen > right of screen > left of screen > bottom of screen, then the position above the screen is preferentially selected from the candidate area as the embedding position suitable for the human image of the commentator on the moving screen.

In one embodiment, when more than one comment object is determined according to the comment content, the computer device may determine a position in a specified range from the moving picture, where a distance between the position and the position where each comment object is located is within a specified range, obtain a candidate region corresponding to each comment object, and select an embedding position applicable to the figure of the comment person in the moving picture from intersection regions of the candidate regions corresponding to each comment object. Referring to fig. 7 (a), fig. 7 (a) is a schematic diagram of a human figure embedding position of an interpreter in still another embodiment. It can be seen that, according to the position of the distance between the positions of the comment objects 702 being within the specified range, the candidate region 704 corresponding to the comment object is obtained, according to the position of the distance between the positions of the comment objects 706 being within the specified range, the candidate region 708 corresponding to the comment object is obtained, and the embedding position suitable for the person image of the comment person in the moving picture is selected from the intersection region 710 of the candidate region 704 and the candidate region 708. In one embodiment, a computer device may obtain a picture size of a commentator figure, determine an embedding location for the commentator figure on a moving picture from an intersection area based on the picture size of the commentator figure. It will be appreciated that the computer device selects a location from the intersection area where the commentator figure can be placed as an embedded location for the commentator figure.

In another embodiment, the computer device may select, from the intersection area, a position where a sum of distances between other objects within a preset range corresponding to each of the narrative objects is maximum and a sum of distances between the narrative objects is minimum as an embedding position suitable for the narrative person image in the moving picture. For example, referring to fig. 7 (b), fig. 7 (b) is a schematic diagram illustrating a person-image embedding position in yet another embodiment. It can be seen that the other objects covered by the preset range 714 corresponding to the comment object 702 and the preset range 716 corresponding to the comment object 706 are determined, and the position having the largest sum of distances to the covered other objects and the smallest sum of distances to the comment object 702 and the comment object 706 is selected from the intersection area 710 as the embedding position suitable for the commentator image 712. The sum of the distances between the commentator figure 712 and the commentator objects 702, 706 is significantly smaller than the sum of the distances between the commentator figure 712 and other objects covered, and the embedding position of the commentator figure 712 has significant directivity.

It will be appreciated that in the above embodiment, the computer device may obtain a candidate region or a blank region in the intersection region, and select an embedding position suitable for the image of the commentator in the moving picture from the blank region, so as to avoid blocking other objects of the moving picture.

In one embodiment, the computer device identifies the region in the moving picture where each object is located, and obtains a blank region in the candidate region or intersection region.

In one embodiment, the computer device may identify the region in which each object is located from the moving picture by the object detection policy. Target detection is the localization of specific objects from an image, e.g., in some scenarios, it is desirable to locate obstacles from a road image, etc. The target detection strategy can be a trained target detection model or a general target detection algorithm. The target detection model can learn through the sample image of the specific scene, so that the target detection model has the capability of detecting the target of the sample image of the specific scene.

Taking the object detection model as an example, referring to fig. 8, fig. 8 is a flowchart of identifying an area where each object is located in a moving picture in one embodiment. It can be seen that the moving picture is input into the object detection model, the feature extraction is performed on the moving picture through the object detection model, the feature diagram of the moving picture is obtained, the boundary frame is determined in the moving picture according to the feature diagram through the object detection model, and the area in the boundary frame in the moving picture is taken as the area where each object in the moving picture is located.

In one embodiment, the playback terminal obtains a fused picture delivered by the computer device, and displays the commentary figure at a first position matched with the commentary object in the moving picture.

In this embodiment, since the embedding position of the commentator portrait in the moving picture is selected from the blank area of the moving picture, the commentator portrait is prevented from blocking the object in the moving picture; and, according to the first position matched with the object of commentary in the moving picture, embed the person of commentator in the moving picture, make video watch users can know the object of commentator's current commentary promptly in combination with the position of person of commentator.

In one embodiment, the computer device may determine a picture size of the commentator portrait based on a picture size of the commentator.

In one embodiment, the computer device may obtain a picture size of the commentary object, scale the commentary portrait by the picture size with the picture size of the commentary object as a reference. For example, the computer device may obtain a bounding box corresponding to the commentary object, and take a picture length and a picture width of the bounding box as a picture size of the commentary object.

In one embodiment, a computer device obtains a screen size and a real size of an commentary object; determining a ratio between a picture size and a real size of the commentary object; a picture size suitable for the human figure of the commentator displayed in the moving picture is obtained in terms of the ratio and the real size of the commentator in the commentator picture.

Wherein the real size refers to the size of the illustration object in a real scene. Taking a moving picture of a sports competition as an example, the real dimensions refer to the body height and body width of the athlete.

In one embodiment, the computer device may obtain a picture size and a real size of the commentary object, determine a ratio between the picture size and the real size of the commentary object, and determine a picture size of the commentary portrait in accordance with the real size of the commentator and the ratio.

In this embodiment, the adaptive scaling is performed on the commentator portrait, so as to improve the fusion degree of the commentator portrait in the moving picture.

In the method for playing video, the commentator figure is displayed at the first position of the moving picture, and because the first position is matched with the commentator object, when the user watches the video, the user can quickly know the current commentary object of the commentator by combining the position of the commentator figure, and quickly understand the video content by combining the commentator figure, thereby improving the efficiency of the user for acquiring the video information when watching the video.

In one embodiment, the playback terminal displays the commentator figure at a first location of the moving picture, and when the cancel display condition is satisfied, the playback terminal cancels the display of the commentator figure at the first location in the moving picture that matches the commentator object.

In one embodiment, the playing terminal cancels the display of the commentator portrait at the first position matched with the commentary object in the moving picture, specifically, the playing terminal cancels the display of the commentator portrait in the moving picture, or the playing terminal cancels the display of the commentator portrait at the first position matched with the commentary object and then displays the commentator portrait at the default position in the moving picture.

In one embodiment, the default location may be a location randomly selected by the computer device in a blank area of the moving picture. In other embodiments, a plurality of default positions may be preset in the moving picture, such as an upper left corner, a lower left corner, an upper right corner, a lower right corner, etc., from among the preset default positions, the computer device selects one of the default positions, so that the playing terminal displays the commentator portrait at the default position.

The following describes the cancellation of the display of the content of the commentator portrait:

in one embodiment, the method further comprises: when the commentary object moves out of the moving picture, the display of the commentary portrait at the first location in the moving picture that matches the commentary object is cancelled.

Specifically, a case where the comment object moves out of the moving picture may occur in the following cases: ① The explanation object is in a moving state, and moves out of the acquisition view angle of the shooting equipment; ② The acquisition visual angle of the shooting equipment is changed, and the changed acquisition visual angle cannot shoot the commentary object; ③ The photographing apparatus is switched, and the switched photographing apparatus cannot photograph the commentary object.

In one embodiment, when the computer device detects that the commentary object matching the commentary content moves out of the moving picture, the computer device communicates the moving picture to the playback terminal such that the playback terminal cancels the display of the commentary portrait at the first location in the moving picture matching the commentary object.

For example, referring to fig. 9 (a), fig. 9 (a) is a schematic diagram showing a human figure of an interpreter in one embodiment. It can be seen that the commentary object 902, the commentary portrait 904, the other objects 906, and the other objects 908 are included in the previous moving picture, and the acquisition view angle of the photographing apparatus is changed, and only the other objects 906 and the other objects 908 are included in the subsequent moving picture, the commentary object 902 is moved out of the moving picture, so that the display of the commentary portrait 904 at the position matching with the commentary object 902 in the moving picture is canceled.

In this embodiment, when the commentary object moves out of the moving picture, the display of the commentator portrait is canceled in time, so as to avoid the excessive occupation of the video picture by the commentator portrait.

In one embodiment, the method further comprises: when the other object is moved in at the first position in the moving picture matched with the commentary object, the display of the commentary portrait at the first position in the moving picture matched with the commentary object is canceled.

Specifically, when determining an illustration object according to the illustration content, the computer device may determine a position of a distance between the position of the illustration object and the position of the illustration object in a specified range from the moving picture, obtain a candidate region corresponding to the illustration object, and select an embedding position applicable to the figure of the person illustrated in the moving picture from the blank region of the candidate region; when more than one comment object is determined according to the comment content, the computer device may determine a position in a specified range from the moving picture, where a distance between the position and the position where each comment object is located, obtain a candidate region corresponding to each comment object, and select an embedding position applicable to the figure of the comment person in the moving picture from the blank region of the intersection region of the candidate regions corresponding to each comment object. And when the candidate area or the intersection area corresponding to the comment object in the moving picture does not have a blank area, the display of the human figure of the comment person at the first position matched with the comment object in the moving picture is canceled.

In one embodiment, when the computer device detects that the first location in the moving picture matching the commentary object moves into another object, resulting in the inability to embed the commentary portrait, the computer device communicates the moving picture to the playback terminal, causing the playback terminal to cancel the display of the commentary portrait at the first location in the moving picture.

For example, referring to fig. 9 (b), fig. 9 (b) is a schematic diagram of eliminating displaying the human statue of the commentator in another embodiment. It can be seen that the previous moving picture includes the comment object 902, the comment person image 904, and the candidate region 910 corresponding to the comment object 902, and in the subsequent moving picture, another object is moved into the candidate region 910, and there is no position where the comment person image 904 can be placed, so that the display of the comment person image 904 at the position matching the comment object 902 in the moving picture is canceled.

In this embodiment, when the commentator portrait cannot be embedded in the position matching with the commentator object, the display of the commentator portrait is timely canceled, so as to avoid the commentator portrait from blocking other objects in the video frame.

In one embodiment, a computer device obtains commentary for commentary speech for a moving picture acquired with a commentator picture, and determines a commentary object matching the commentary in the synchronously acquired moving picture. The following describes the contents of the switching commentary:

In one embodiment, the method further comprises: when the commentary object matched with the commentary content about the moving picture collected with the commentary picture changes from the first commentary object to the second commentary object, and the second commentary object is not included in the moving picture, the display of the commentary portrait at the first position in the moving picture matched with the first commentary object is canceled.

In one embodiment, when the computer device detects an commentary object that matches commentary content collected with the commentary screen regarding the moving picture, changing from a first commentary object to a second commentary object, and the second commentary object is not included in the moving picture, the computer device communicates the moving picture to the playback terminal such that the playback terminal cancels display of the commentary portrait at a first location in the moving picture that matches the first commentary object.

For example, referring to fig. 10 (a), fig. 10 (a) is a schematic diagram of switching an illustration object in one embodiment. It can be seen that the comment object 1002, the comment person image 1004, the other objects 1006, and the other objects 1008 are included in the preceding moving picture, and the comment object determined from the comment content corresponding to the following moving picture is the other object 1008, and the other object 1008 is not included in the following moving picture, so that the display of the comment person image 1004 at the first position in the moving picture that matches the comment object 1002 is canceled.

In this embodiment, when the commentary object changes and the commentary object after the change is not in the moving picture, the display of the commentary portrait is canceled in time, so as to avoid the excessive occupation of the video picture by the commentary portrait.

In one embodiment, the method further comprises: when an commentary object matched with commentary content related to a moving picture collected with the commentary picture changes from a first commentary object to a second commentary object in the moving picture, a commentary portrait of the commentary picture collected in synchronization with the moving picture is displayed at a first position in the moving picture matched with the second commentary object.

In one embodiment, when a computer device detects an commentary object that matches commentary content collected with a commentary screen about a moving picture, changes from a first commentary object to a second commentary object in the moving picture, the computer device embeds a commentary portrait of the commentary screen that is collected in synchronization with the moving picture into the moving picture at a first location that matches the second commentary object, resulting in a fused picture that includes the commentary portrait, and passes the fused picture to a playback terminal such that the playback terminal cancels display of the commentary portrait at the first location in the moving picture that matches the first commentary object, and displays the commentary portrait at the first location in the moving picture that matches the second commentary object.

For example, referring to fig. 10 (b), fig. 10 (b) is a schematic diagram of switching an illustration object in another embodiment. It can be seen that the commentary object 1002, the commentary portrait 1004, and other objects 1008 are included in the preceding moving picture, and the commentary object determined from the commentary content corresponding to the following moving picture is the other objects 1008, the display of the commentary portrait 1004 at the first position matching with the commentary object 1002 in the moving picture is canceled, and the commentary portrait 1010 of the commentary picture acquired in synchronization with the following moving picture is displayed at the first position matching with the other objects 1008.

In this embodiment, when the commentary object changes, the commentator image is moved to the position matching the changed commentary object in time, so that the video viewing user can learn the object currently commented by the commentator in time.

In one embodiment, displaying an commentator figure of a commentator screen acquired in synchronization with a moving picture at a first location in the moving picture that matches a second commentator object, includes: when the distance between the second comment object and the first comment object in the moving picture is smaller than the threshold value, displaying a comment person image gradually moving from the first position matched with the first comment object to the first position matched with the second comment object in the moving picture.

In one embodiment, when the computer device detects that the distance between the second commentary object and the first commentary object in the moving picture is less than the threshold, the computer device may gradually move the embedding position of the commentary portrait in the moving picture when synthesizing the fusion picture, such that the playing terminal displays the commentary portrait in the moving picture gradually moving from the first position matched with the first commentary object to the first position matched with the second commentary object.

In one embodiment, the computer device may obtain a linear distance between a first position matched with the first comment object and a first position matched with the second comment object, divide the linear distance equally to obtain at least one mobile node, and when the computer device synthesizes the fusion picture, may sequentially embed the comment person image into the moving picture according to the mobile node. For example, referring to fig. 11, fig. 11 is a schematic diagram illustrating an embedding position of a moving interpreter portrait in one embodiment. It can be seen that, when the fusion screen is synthesized, the commentator portrait is embedded in the moving screen in sequence according to the mobile node 1104, the mobile node 1106 and the mobile node 1108, so as to gradually move the commentator portrait from the position 1102 matched with the first commentator object to the position 1110 matched with the second commentator object, by acquiring the linear distance between the position 1102 matched with the first commentator object and the position 1110 matched with the second commentator object and dividing the linear distance equally.

In this embodiment, when the second comment object in the moving picture is closer to the first comment object, a comment person figure gradually moving from a position matching the first comment object to a position matching the second comment object may be displayed to enhance the natural feeling and the sense of integration of the comment person figure.

In one embodiment, displaying an commentator figure of a commentator screen acquired in synchronization with a moving picture at a first location in the moving picture that matches a second commentator object, includes: when the distance between the second comment object and the first comment object in the moving picture is greater than or equal to a threshold value, canceling the display of the comment person image at the first position in the moving picture matched with the first comment object; at a first location in the moving picture that matches the second commentary object, a commentary portrait is displayed.

In one embodiment, when the computer device detects that the distance between the second comment object and the first comment object in the moving picture is greater than or equal to the threshold, the computer device embeds the comment person image obtained from the comment person picture in the moving picture according to the first position matched with the second comment object to obtain a fused picture including the comment person image, and transfers the fused picture to the playing terminal, so that the playing terminal cancels the display of the comment person image at the first position matched with the first comment object in the moving picture, and displays the comment person image at the first position matched with the second comment object in the moving picture.

In this embodiment, when the distance between the second comment object and the first comment object in the moving picture is longer, the display of the comment person image at the position matching with the first comment object may be canceled first, and then the comment person image is displayed at the position matching with the second comment object, so as to avoid the offensive feeling caused by the long-distance movement of the comment person image.

In one embodiment, the method further comprises: when the commentary object is moved in the moving picture, a commentary portrait moving in the moving picture is displayed at a first position that matches the commentary object moving in the moving picture.

In one embodiment, when the commentary object moves in the moving picture, the computer device may select an embedding position applicable to the commentary figure in the moving picture from the blank regions of the candidate regions corresponding to the commentary object (or the blank regions of the intersection regions corresponding to more than one commentary object), such that the playing terminal displays the commentary figure moving in the moving picture at a first position matching the commentary object moving in the moving picture.

In this embodiment, the commentator figures move along with the commentary object, so that the natural sense and the blending sense of the commentator figures in the video picture are enhanced.

In one embodiment, the method further comprises: when the acquisition view angle of the moving picture deviates, the commentator portrait of the commentator picture acquired synchronously with the moving picture is displayed at a first position which is matched with the commentator object in the moving picture and gradually deviates along with the deviation of the acquisition view angle.

In one embodiment, when the acquisition view angle of the moving picture is shifted, the computer device may gradually shift the embedded position of the commentator portrait from the blank area of the candidate area corresponding to the commentary object (or the blank area of the intersection area corresponding to more than one commentary object), so that the play terminal displays the commentator portrait at a position in the moving picture that matches the commentator object and gradually shifts with the shift of the acquisition view angle.

In one embodiment, the offset distance of the embedded position of the commentator portrait is positively correlated with the offset distance of the acquisition viewing angle of the photographing device, and the offset direction of the embedded position of the commentator portrait coincides with the offset direction of the acquisition viewing angle of the photographing device.

For example, referring to fig. 12, fig. 12 is a schematic diagram illustrating an embedding position of a moving interpreter portrait in another embodiment. It can be seen that the capturing view angle of the capturing device changes, in the following moving picture, the original position of the commentator portrait 1204 moves out of the moving picture, and the commentator portrait is still displayed in the moving picture due to the shift of the position of the commentator portrait 1204.

In this embodiment, the commentator figures are gradually shifted along with the shift of the acquisition viewing angle, so as to avoid shifting out of the commentator figures.

In one embodiment, the method further comprises: when the size of the commentary object in the moving picture changes, a commentary figure which changes the size synchronously with the change of the size of the commentary object is displayed at a first position matched with the commentary object in the moving picture.

In one embodiment, due to a lens distance change or the like, the size of the commentary object is changed, and the computer device embeds the commentary object into the moving picture after correspondingly changing the commentary object, so that the playing terminal displays the commentary object with the size which is synchronously changed along with the change of the size of the commentary object at a first position matched with the commentary object in the moving picture.

In one embodiment, the size of the commentary is varied to a degree consistent with the size of the commentary.

In one embodiment, when the size of the commentary object is changed so that a part of the commentary object is displayed in the moving picture, the computer device performs corresponding clipping and scaling on the commentary portrait and then embeds the commentary portrait in the moving picture. For example, referring to fig. 13, fig. 13 is a schematic diagram of a moving interpreter figure size change in one embodiment. It can be seen that when the commentary 1302 is enlarged, the commentary person 1304 is enlarged to the same size.

In this embodiment, the commentator portrait changes with the size of the commentator, so as to improve the fusion degree and naturalness of the commentator portrait in the video frame.

In one embodiment, the method further comprises: in the moving picture, an object identification corresponding to the object is displayed in an area adjacent to the object in the moving picture.

The area adjacent to the object may be an area having a distance from the position of the object within a specified range.

In one embodiment, the computer device may mark the object identifier corresponding to the object in the moving picture, and issue the marked image to the playing terminal, so that the playing terminal displays the object identifier corresponding to the object in the moving picture in an area adjacent to the object in the moving picture.

For example, referring to fig. 14, fig. 14 is a schematic diagram showing an object identifier corresponding to an object in one embodiment. It can be seen that in the moving picture, an area adjacent to the object displays an object identification corresponding to the object.

In one embodiment, in the moving picture, an object identifier corresponding to the comment object is displayed in an area adjacent to the comment object.

In this embodiment, in the moving picture, the object identifier corresponding to the object is displayed in the area adjacent to the object, so that the video viewing user can more quickly understand the video content in combination with the identifier content.

In one embodiment, displaying a moving picture at a video playback interface includes: displaying a moving picture collected from a sports competition on a video playing interface; displaying an commentator portrait at a first location of a moving picture, comprising: in the moving picture, at a first position matched with a target player in the moving picture, an commentator figure of a race commentator picture acquired in synchronization with the moving picture is displayed.

Among them, the athletic competition can be a competition in a realistic scene, such as football, basketball, volleyball, etc.

In one embodiment, referring to fig. 15, fig. 15 is a flow diagram of a method of playing video in one embodiment.

Firstly, the computer equipment acquires a live broadcast segment and a live broadcast segment of a race condition commentator, extracts one frame of moving pictures from the live broadcast segment, and extracts the race condition commentator pictures synchronously acquired with the moving pictures from the live broadcast segment of the race condition commentator;

then, the computer equipment performs image segmentation on the competition condition commentator picture to obtain a segmentation result corresponding to the competition condition commentator picture, and segments the area where the competition condition commentator is located from the competition condition commentator picture according to the segmentation result to obtain a commentator figure;

Next, the computer equipment acquires the commentary content which is collected by the commentary picture along with the event condition and is related to the moving picture, carries out semantic recognition on the commentary content, acquires the athlete identification in the commentary content, determines the athlete identification of each object in the synchronously collected moving picture, and determines a target athlete from the moving picture according to the athlete identification in the commentary content and the athlete identification of each object in the moving picture;

Next, the computer device obtains the picture size and the real size of the target player, determines the ratio between the picture size and the real size of the target player, and obtains the picture size of the commentator portrait suitable for being displayed in the moving picture according to the ratio and the real size of the commentator in the game state commentator picture;

next, the computer device embeds the commentator figures obtained from the event commentator figures into the moving picture according to a first position matched with the target athlete in the moving picture, and a fusion picture comprising the commentator figures is obtained;

Finally, the computer device transmits the fusion picture to the playing terminal, so that the playing terminal displays the commentator figures of the event commentator picture synchronously acquired with the moving picture at a first position matched with the target athlete in the moving picture.

In this embodiment, the commentator figures can be naturally and smoothly merged into the live broadcast segment, and the live broadcast watching user can quickly learn the player currently commented by the game state commentator in combination with the position of the commentator figures in each frame of moving picture, and quickly understand the live broadcast content in combination with the commentator figures in each frame of moving picture, so that the efficiency of the live broadcast watching user in acquiring live broadcast information when watching live broadcast is improved; and the live broadcast picture is prevented from being displayed in a switching manner between the live broadcast room of the competition scene and the live broadcast room of the competition situation commentator, and the live broadcast watching user is prevented from missing the content of the competition scene, so that the comprehensiveness of acquiring live broadcast information when the live broadcast watching user information is watched is improved.

In one embodiment, as shown in fig. 16, a method for playing video is provided, and this embodiment is mainly applied to a computer device (the playing terminal 102 or the server 104 in fig. 1) for illustration, and includes the following steps:

Step S1602, a live broadcast segment and an commentator live broadcast segment are acquired, one frame of moving pictures is extracted from the live broadcast segment, and the commentator pictures synchronously acquired with the moving pictures are extracted from the commentator live broadcast segment.

In step S1604, the image of the commentator picture is divided to obtain a division result corresponding to the commentator picture, and the region where the commentator is located is divided from the commentator picture according to the division result to obtain the commentator portrait.

Step S1606, obtain the comment content related to the moving picture collected along with the comment member picture, perform semantic recognition on the comment content, obtain the object identification in the comment content, determine the object identification of each object in the moving picture collected synchronously, and determine the comment object from the moving picture according to the object identification in the comment content and the object identification of each object in the moving picture.

Step S1608, obtaining the picture size and the real size of the commentary object, determining the ratio between the picture size and the real size of the commentary object, and obtaining the picture size of the commentator figure suitable for displaying in the moving picture according to the ratio and the real size of the commentator in the commentator picture.

Step S1610, embedding the commentator portrait obtained from the commentator picture into the moving picture according to the first position matched with the commentator object in the moving picture, to obtain a fusion picture including the commentator portrait.

In one embodiment, referring to fig. 17, fig. 17 is a flow chart of a method of playing video in another embodiment. It can be seen that the server can perform the combination operation of the combined picture, and the combined picture is transferred to the playing terminal, so that the playing terminal displays the commentator figures of the event commentator picture synchronously collected with the moving picture at the first position matched with the target athlete in the moving picture.

According to the video playing method, the commentator figures can be naturally and smoothly merged into the live broadcast segment, and a live broadcast watching user can quickly learn the current commentary object of the commentator in combination with the position of the commentator figures in each frame of moving picture, and quickly understand the live broadcast content in combination with the commentator figures in each frame of moving picture, so that the efficiency of the live broadcast watching user in acquiring live broadcast information when watching live broadcast is improved; and avoid live broadcast picture to switch the show between match scene and commentator live broadcast room, avoid live broadcast to watch the user to miss the content of match scene to promote live broadcast and watch user information and obtain the comprehensiveness of live broadcast information when watching live broadcast.

It should be understood that, although the steps in the flowcharts of fig. 2 and 16 are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in fig. 2, 16 may include multiple steps or stages that are not necessarily performed at the same time, but may be performed at different times, nor does the order in which the steps or stages are performed necessarily performed in sequence, but may be performed alternately or alternately with at least a portion of the steps or stages in other steps or other steps.

In one embodiment, as shown in fig. 18, an apparatus for playing video is provided, where the apparatus may use a software module or a hardware module, or a combination of both, as a part of a computer device, and specifically includes: a display module 1802, wherein:

the display module 1802 is configured to display a video playing interface;

The display module 1802 is further configured to display a moving picture on the video playing interface;

the display module 1802 is further configured to display an commentator portrait at a first position of the moving picture; wherein the first location matches an commentary object of the commentator.

In one embodiment, the apparatus for playing video further includes a cancel module configured to: when the commentary object moves out of the moving picture, the display of the commentary portrait at the first location in the moving picture that matches the commentary object is cancelled.

In one embodiment, the cancellation module is further to: when the other object is moved in at the first position in the moving picture matched with the commentary object, the display of the commentary portrait at the first position in the moving picture matched with the commentary object is canceled.

In one embodiment, the cancellation module is further to: when the commentary object matched with the commentary content about the moving picture collected with the commentary picture changes from the first commentary object to the second commentary object, and the second commentary object is not included in the moving picture, the display of the commentary portrait at the first position in the moving picture matched with the first commentary object is canceled.

In one embodiment, the display module 1802 is further configured to: when an commentary object matched with commentary content related to a moving picture collected with the commentary picture changes from a first commentary object to a second commentary object in the moving picture, a commentary portrait of the commentary picture collected in synchronization with the moving picture is displayed at a first position in the moving picture matched with the second commentary object.

In one embodiment, the apparatus for playing video further comprises a mobile module for: when the distance between the second comment object and the first comment object in the moving picture is smaller than the threshold value, displaying a comment person image gradually moving from the first position matched with the first comment object to the first position matched with the second comment object in the moving picture.

In one embodiment, the mobile module is further configured to: when the distance between the second comment object and the first comment object in the moving picture is greater than or equal to a threshold value, canceling the display of the comment person image at the first position in the moving picture matched with the first comment object; at a first location in the moving picture that matches the second commentary object, a commentary portrait is displayed.

In one embodiment, the display module 1802 is further configured to: when the commentary object is moved in the moving picture, a commentary portrait moving in the moving picture is displayed at a first position that matches the commentary object moving in the moving picture.

In one embodiment, the display module 1802 is further configured to: when the acquisition view angle of the moving picture deviates, the commentator portrait of the commentator picture acquired synchronously with the moving picture is displayed at a first position which is matched with the commentator object in the moving picture and gradually deviates along with the deviation of the acquisition view angle.

In one embodiment, the display module 1802 is further configured to: when the size of the commentary object in the moving picture changes, a commentary figure which changes the size synchronously with the change of the size of the commentary object is displayed at a first position matched with the commentary object in the moving picture.

In one embodiment, the display module 1802 is further configured to: in the moving picture, an object identification corresponding to the object is displayed in an area adjacent to the object in the moving picture.

In one embodiment, the display module 1802 is further configured to: acquiring commentary content about a moving picture acquired along with a commentator picture; a commentary object matching the commentary content is determined in the synchronously acquired moving pictures.

In one embodiment, the display module 1802 is further configured to: semantic recognition is carried out on the comment content, and object identification in the comment content is obtained; determining object identifiers of all objects in the synchronously acquired moving pictures; the narrative object is determined from the moving picture based on the object identification in the narrative content and the object identification of each object in the moving picture.

In one embodiment, the display module 1802 is further configured to: acquiring an commentator picture acquired synchronously with the moving picture; image segmentation is carried out on the commentator picture, and a segmentation result corresponding to the commentator picture is obtained; and dividing the region where the commentator is located from the commentator picture according to the dividing result to obtain the commentator portrait.

In one embodiment, the display module 1802 is further configured to: acquiring the picture size and the real size of the explanation object; determining a ratio between a picture size and a real size of the commentary object; a picture size suitable for the human figure of the commentator displayed in the moving picture is obtained in terms of the ratio and the real size of the commentator in the commentator picture.

In one embodiment, the display module 1802 is further configured to: and embedding the commentator portrait obtained from the commentator picture into the moving picture according to a first position matched with the commentator object in the moving picture to obtain a fusion picture comprising the commentator portrait.

In one embodiment, the display module 1802 is further configured to: displaying a moving picture collected from a sports competition on a video playing interface; in the moving picture, at a first position matched with a target player in the moving picture, an commentator figure of a race commentator picture acquired in synchronization with the moving picture is displayed.

For specific limitations on the apparatus for playing video, reference may be made to the above limitations on the method for playing video, and no further description is given here. The various modules in the video playing device described above may be implemented in whole or in part by software, hardware, or a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

In the device for playing video, the commentator figure is displayed at the first position of the moving picture, and because the first position is matched with the commentator object, when the user watches the video, the user can quickly know the current commentary object of the commentator by combining the position of the commentator figure, and quickly understand the video content by combining the commentator figure, thereby improving the efficiency of the user for acquiring the video information when watching the video.

In one embodiment, a computer device is provided, which may be a terminal, and the internal structure thereof may be as shown in fig. 19. The computer device includes a processor, a memory, a network interface, and a display screen connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The network interface of the computer device is used for carrying out network communication in a wired or wireless mode with an external terminal, and the wireless mode can be realized through WIFI, an operator network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement a method of playing video. The display of the computer device may be a liquid crystal display or an electronic ink display.

It will be appreciated by those skilled in the art that the structure shown in FIG. 19 is merely a block diagram of some of the structures associated with the present inventive arrangements and is not limiting of the computer device to which the present inventive arrangements may be applied, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.

In an embodiment, there is also provided a computer device comprising a memory and a processor, the memory having stored therein a computer program, the processor implementing the steps of the method embodiments described above when the computer program is executed.

In one embodiment, a computer-readable storage medium is provided, storing a computer program which, when executed by a processor, implements the steps of the method embodiments described above.

In one embodiment, a computer program product or computer program is provided that includes computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device performs the steps in the above-described method embodiments.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, or the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory. By way of illustration, and not limitation, RAM can be in various forms such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), etc.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The above examples illustrate only a few embodiments of the application, which are described in detail and are not to be construed as limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of protection of the present application is to be determined by the appended claims.

Claims

1. A method of playing video, the method comprising:

Displaying a video playing interface;

displaying a moving picture on the video playing interface;

acquiring an commentator picture acquired synchronously with the moving picture, and picking up images from the commentator picture to obtain a commentator portrait;

acquiring commentary content about the moving picture collected with the commentary picture, the commentary content being used to describe the moving picture;

Performing semantic recognition on the comment content to obtain an object identifier in the comment content;

Determining a commentary object matched with the commentary content from the moving picture according to an intersection between the object identification in the commentary content and the object identification of each object in the moving picture;

When more than one comment objects are provided, determining positions, in which the distances between the positions of the comment objects and the comment objects are within a specified range, from the moving picture, obtaining candidate areas corresponding to the comment objects, and selecting a position, in which the sum of the distances between other objects in a preset range corresponding to the comment objects is maximum and the sum of the distances between the comment objects is minimum, from intersection areas of the candidate areas corresponding to the comment objects as a first position suitable for embedding a comment person figure in the moving picture;

Displaying the embedded commentator portrait at a first location of the moving picture;

When a comment object that matches comment content collected with the comment screen regarding the moving picture changes from a first comment object to a second comment object in the moving picture, and a distance of the second comment object in the moving picture from the first comment object is less than a threshold value, the comment portrait that gradually moves from a first position that matches the first comment object to a first position that matches the second comment object is displayed in the moving picture.

2. The method according to claim 1, wherein the method further comprises:

when the commentary object moves out of the moving picture, the display of the commentary portrait at a first location in the moving picture that matches the commentary object is cancelled.

3. The method according to claim 1, wherein the method further comprises:

when other objects are moved in at a first position in the moving picture that matches the commentary object, the display of the commentary portrait at the first position in the moving picture that matches the commentary object is canceled.

4. The method according to claim 1, wherein the method further comprises:

When an commentary object that matches commentary content collected with the commentary picture about the moving picture changes from a first commentary object to a second commentary object, and the second commentary object is not included in the moving picture, display of the commentary portrait at a first location in the moving picture that matches the first commentary object is canceled.

5. The method according to claim 1, wherein the method further comprises:

Canceling display of the commentator portrait at a first location in the moving picture that matches the first commentator object when a distance of a second commentator object in the moving picture from the first commentator object is greater than or equal to a threshold;

The commentator portrait is displayed at a first location in the moving picture that matches the second commentary object.

6. The method according to claim 1, wherein the method further comprises:

when the commentary object moves in the moving picture, the commentary portrait moving in the moving picture is displayed at a first position matching the commentary object moving in the moving picture.

7. The method according to claim 1, wherein the method further comprises:

When the acquisition view angle of the moving picture deviates, displaying the commentator portrait of the commentator picture synchronously acquired with the moving picture at a first position which is matched with the commentator object in the moving picture and gradually deviates along with the deviation of the acquisition view angle.

8. The method according to claim 1, wherein the method further comprises:

when the size of the commentary object in the moving picture changes, the commentary portrait of which the size changes synchronously with the change of the size of the commentary object is displayed at a first position matched with the commentary object in the moving picture.

9. The method according to claim 1, wherein the method further comprises:

in the moving picture, an object identification corresponding to an object in the moving picture is displayed in an area adjacent to the object.

10. The method according to claim 1, wherein the method further comprises:

11. The method according to claim 1, wherein the method further comprises:

Acquiring the picture size and the real size of the commentary object;

Determining a ratio between a picture size and a true size of the commentary object;

And obtaining a picture size of the commentator portrait suitable for being displayed in the moving picture according to the ratio and the real size of the commentator in the commentator picture.

12. The method according to claim 1, wherein the method further comprises:

13. The method according to any one of claims 1 to 12, wherein displaying a moving picture on the video playback interface includes:

displaying a moving picture acquired from a sports competition on the video playing interface;

the displaying of the commentator portrait at the first position of the moving picture includes:

14. An apparatus for playing video, the apparatus comprising:

the display module is used for displaying a video playing interface;

The display module is also used for displaying a moving picture on the video playing interface;

the display module is also used for acquiring an commentator picture synchronously acquired with the moving picture, and picking up images from the commentator picture to obtain a commentator portrait; acquiring commentary content about the moving picture collected with the commentary picture, the commentary content being used to describe the moving picture; performing semantic recognition on the comment content to obtain an object identifier in the comment content; determining object identifiers of all objects in the synchronously acquired moving pictures; determining a commentary object matched with the commentary content from the moving picture according to an intersection between the object identification in the commentary content and the object identification of each object in the moving picture; when more than one commentary object is determined according to the commentary content, determining the position of the distance between the position of each commentary object and the position of each commentary object in a specified range from the moving picture, obtaining a candidate region corresponding to each commentary object, and selecting the position with the maximum sum of the distances between other objects in a preset range corresponding to each commentary object and the minimum sum of the distances between other objects in the preset range corresponding to each commentary object from the intersection region of the candidate regions corresponding to each commentary object as a first position suitable for the commentary portrait on the moving picture; displaying the embedded commentator portrait at a first location of the moving picture;

A moving module for displaying, in the moving picture, the commentary portrait gradually moving from a first position matching the first commentary object to a first position matching the second commentary object when a commentary object matching commentary content acquired with the commentary picture about the moving picture changes from a first commentary object to a second commentary object in the moving picture, and a distance of the second commentary object in the moving picture from the first commentary object is less than a threshold value.

15. The apparatus of claim 14, wherein the means for playing video further comprises a cancellation module for: when the commentary object moves out of the moving picture, the display of the commentary portrait at a first location in the moving picture that matches the commentary object is cancelled.

16. The apparatus of claim 14, wherein the cancellation module is further configured to: when other objects are moved in at a first position in the moving picture that matches the commentary object, the display of the commentary portrait at the first position in the moving picture that matches the commentary object is canceled.

17. The apparatus of claim 14, wherein the cancellation module is further configured to: when an commentary object that matches commentary content collected with the commentary picture about the moving picture changes from a first commentary object to a second commentary object, and the second commentary object is not included in the moving picture, display of the commentary portrait at a first location in the moving picture that matches the first commentary object is canceled.

18. The apparatus of claim 14, wherein the mobile module is further configured to: when the distance between the second comment object and the first comment object in the moving picture is greater than or equal to a threshold value, canceling the display of the comment person image at the first position in the moving picture matched with the first comment object; at a first location in the moving picture that matches the second commentary object, a commentary portrait is displayed.

19. The apparatus of claim 14, wherein the display module is further configured to: when the commentary object moves in the moving picture, the commentary portrait moving in the moving picture is displayed at a first position matching the commentary object moving in the moving picture.

20. The apparatus of claim 14, wherein the display module is further configured to: when the acquisition view angle of the moving picture deviates, displaying the commentator portrait of the commentator picture synchronously acquired with the moving picture at a first position which is matched with the commentator object in the moving picture and gradually deviates along with the deviation of the acquisition view angle.

21. The apparatus of claim 14, wherein the display module is further configured to: when the size of the commentary object in the moving picture changes, the commentary portrait of which the size changes synchronously with the change of the size of the commentary object is displayed at a first position matched with the commentary object in the moving picture.

22. The apparatus of claim 14, wherein the display module is further configured to: in the moving picture, an object identification corresponding to an object in the moving picture is displayed in an area adjacent to the object.

23. The apparatus of claim 14, wherein the display module is further configured to: acquiring an commentator picture acquired synchronously with the moving picture; image segmentation is carried out on the commentator picture, and a segmentation result corresponding to the commentator picture is obtained; and dividing the region where the commentator is located from the commentator picture according to the dividing result to obtain the commentator portrait.

24. The apparatus of claim 14, wherein the display module is further configured to: acquiring the picture size and the real size of the commentary object; determining a ratio between a picture size and a true size of the commentary object; and obtaining a picture size of the commentator portrait suitable for being displayed in the moving picture according to the ratio and the real size of the commentator in the commentator picture.

25. The apparatus of claim 14, wherein the display module is further configured to: and embedding the commentator portrait obtained from the commentator picture into the moving picture according to a first position matched with the commentator object in the moving picture to obtain a fusion picture comprising the commentator portrait.

26. The apparatus of claim 14, wherein the display module is further configured to: displaying a moving picture acquired from a sports competition on the video playing interface; the displaying of the commentator portrait at the first position of the moving picture includes: in the moving picture, at a first position matched with a target player in the moving picture, an commentator figure of a race commentator picture acquired in synchronization with the moving picture is displayed.

27. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any one of claims 1 to 13 when the computer program is executed.

28. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the method of any one of claims 1 to 13.