CN112866814A - Audio and video processing method and device - Google Patents
Audio and video processing method and device Download PDFInfo
- Publication number
- CN112866814A CN112866814A CN202011643037.8A CN202011643037A CN112866814A CN 112866814 A CN112866814 A CN 112866814A CN 202011643037 A CN202011643037 A CN 202011643037A CN 112866814 A CN112866814 A CN 112866814A
- Authority
- CN
- China
- Prior art keywords
- processing
- plug
- audio
- frame data
- video stream
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000003672 processing method Methods 0.000 title description 6
- 238000012545 processing Methods 0.000 claims abstract description 342
- 238000000034 method Methods 0.000 claims abstract description 90
- 238000012805 post-processing Methods 0.000 claims description 33
- 238000003754 machining Methods 0.000 claims description 23
- 238000003860 storage Methods 0.000 claims description 17
- 238000004590 computer program Methods 0.000 claims description 3
- 239000002131 composite material Substances 0.000 description 11
- 238000002156 mixing Methods 0.000 description 10
- 238000010586 diagram Methods 0.000 description 8
- 238000005111 flow chemistry technique Methods 0.000 description 6
- 230000006870 function Effects 0.000 description 5
- 230000000694 effects Effects 0.000 description 4
- 230000011218 segmentation Effects 0.000 description 4
- 230000003796 beauty Effects 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 3
- 238000004891 communication Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000004519 manufacturing process Methods 0.000 description 3
- 238000013473 artificial intelligence Methods 0.000 description 2
- 230000001960 triggered effect Effects 0.000 description 2
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000008707 rearrangement Effects 0.000 description 1
- 230000011664 signaling Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/443—OS processes, e.g. booting an STB, implementing a Java virtual machine in an STB or power management in an STB
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/60—Network structure or processes for video distribution between server and client or between remote clients; Control signalling between clients, server and network components; Transmission of management data between server and client, e.g. sending from server to client commands for recording incoming content stream; Communication details between server and client
- H04N21/63—Control signaling related to video distribution between client, server and network components; Network processes for video distribution between server and clients or between remote clients, e.g. transmitting basic layer and enhancement layers over different transmission paths, setting up a peer-to-peer communication via Internet between remote STB's; Communication protocols; Addressing
- H04N21/637—Control signals issued by the client directed to the server or network components
- H04N21/6371—Control signals issued by the client directed to the server or network components directed to network
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/60—Network structure or processes for video distribution between server and client or between remote clients; Control signalling between clients, server and network components; Transmission of management data between server and client, e.g. sending from server to client commands for recording incoming content stream; Communication details between server and client
- H04N21/65—Transmission of management data between client and server
- H04N21/654—Transmission by server directed to the client
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Software Systems (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
Abstract
The application discloses a method and a device for audio and video processing, wherein the method comprises the following steps: receiving a processing task, wherein the processing task is used for describing a processing flow required by audio and video processing, and the processing task at least comprises the following steps: the method comprises the steps that an identification of audio and video streams to be processed and a processing plug-in identification list are obtained; pulling the audio/video stream corresponding to the identifier of the audio/video stream to be processed; and sequentially calling the processing plug-ins corresponding to the processing plug-in identification in the processing plug-in identification list, and processing the content of the audio and video stream. Through the process, the single audio and video stream can be processed in multiple ways, the processing delay is reduced, and the processing efficiency of the audio and video stream is improved.
Description
Technical Field
The embodiment of the application relates to a multimedia data processing technology, in particular to a method and a device for processing audio and video.
Background
Currently, with the progress of network communication technology and the speed increase of broadband networks, live webcasts are increasingly developed and applied. In order to improve the live effect of the anchor, a content processing service may be generally used to perform content processing on the live video, for example, AI (Artificial Intelligence) beautification, background segmentation, whiteboard processing, and the like on the video.
In the related art, as shown in fig. 1, a method for processing content of a video includes the following steps: the method comprises the steps of carrying out stream pulling through a plurality of receivers, decoding each video stream through a decoder, carrying out mixed stream processing on the decoded video stream through a mixer (compositor), processing the mixed video stream through a Lua language programming plug-in the process of mixing, then coding the processed video through an encoder (encoder), and carrying out stream pushing processing (pulser) on the coded video stream.
However, the above processing method has the following disadvantages:
1. plug-ins can only be written using the Lua language;
2. the processes of one-time stream pulling, decoding, mixed stream, processing, coding and stream pushing can only be processed once, if the video needs to be processed for multiple times, and the delay is high.
Disclosure of Invention
The application provides a method and a device for processing audio and video, which aim to solve the problems of high processing delay and limitation of processing plug-in units in the processing mode in the prior art.
In a first aspect, an embodiment of the present application provides a method for audio and video processing, where the method includes:
receiving a processing task, wherein the processing task is used for describing a processing flow required by audio and video processing, and the processing task at least comprises the following steps: the method comprises the steps that an identification of audio and video streams to be processed and a processing plug-in identification list are obtained;
pulling the audio/video stream corresponding to the identifier of the audio/video stream to be processed;
and sequentially calling the processing plug-ins corresponding to the processing plug-in identification in the processing plug-in identification list, and processing the content of the audio and video stream.
In a second aspect, an embodiment of the present application further provides an apparatus for audio and video processing, where the apparatus includes:
the task receiving module is used for receiving a processing task, the processing task is used for describing a processing flow required by audio and video processing, and the processing task at least comprises the following steps: the method comprises the steps that an identification of audio and video streams to be processed and a processing plug-in identification list are obtained;
the stream pulling module is used for pulling the audio and video stream corresponding to the identifier of the audio and video stream to be processed;
and the content processing module is used for sequentially calling the processing plug-ins corresponding to the processing plug-in identification in the processing plug-in identification list and carrying out content processing on the audio and video stream.
In a third aspect, an embodiment of the present application further provides a server, where the server includes:
one or more processors;
a storage device for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement the methods described above.
In a fourth aspect, the present application further provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement the method described above.
The technical scheme that this application provided has following beneficial effect:
in this embodiment, a method for processing audio and video in a pipeline is provided, where content processing may be performed on audio and video streams according to a processing task, and specifically, after the processing task is received, stream pulling may be performed according to an identifier of the audio and video stream to be processed in the processing task to obtain the audio and video streams. And then sequentially calling the processing plug-ins corresponding to the processing plug-in identifications in the processing plug-in identification list according to the processing plug-in identification list in the processing task to process the content of the audio and video stream.
In addition, the embodiment can assemble the processing plug-in according to the processing plug-in identification list, and when multiple content processing is needed, only corresponding processing plug-ins need to be added on the production line, so that the form and flexibility of content processing are enriched.
In addition, when multi-channel audio and video stream processing is carried out, post-processing can be carried out on the merged frame data after mixing, multiple processing capabilities can be realized by one processing flow line, and the delay generated by content processing can be better reduced by only once spending on pushing and pulling streams and encoding and decoding.
Drawings
FIG. 1 is a schematic flow chart of content processing for video in the prior art as mentioned in the background of the present application;
fig. 2 is a flowchart of an embodiment of a method for audio and video processing according to an embodiment of the present application;
fig. 3 is a flowchart of an embodiment of a method for audio and video processing according to a second embodiment of the present application;
fig. 4 is a schematic diagram of an exemplary audio/video processing pipeline provided in a third embodiment of the present application;
fig. 5 is an exemplary flowchart illustrating content processing performed on a live stream according to a third embodiment of the present application;
fig. 5a is a schematic diagram of a composite image frame 1 according to the third embodiment of the present application;
fig. 5b is a schematic diagram of a composite image frame 2 according to the third embodiment of the present application;
fig. 6 is a block diagram of an embodiment of an audio and video processing apparatus according to a fourth embodiment of the present application;
fig. 7 is a schematic structural diagram of a server according to a fifth embodiment of the present application.
Detailed Description
The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the application and are not limiting of the application. It should be further noted that, for the convenience of description, only some of the structures related to the present application are shown in the drawings, not all of the structures.
Example one
Fig. 2 is a flowchart of an embodiment of an audio and video processing method provided in an embodiment of the present application, where the audio and video processing method is a cloud processing method similar to a pipeline, and multiple processing plug-ins may be connected in series to a cloud to process an audio and video stream. As shown in fig. 2, the present embodiment may be applied to a server, and the server may implement the method of the present embodiment through one processing procedure.
The present embodiment may include the following steps:
The embodiment can be applied to scenes for performing content processing on audio and video streams, for example, processing such as beautifying and background segmentation on live video in a live scene. For the audio/video stream to be processed, a corresponding processing task can be created according to the processing requirement indicated by the user, and the processing task is sent to the processing process implementing the embodiment.
The method for creating the processing task by the demand party is not limited in this embodiment, for example, the demand party may be a service on a server, or may also be a software program (such as APP on a mobile phone) on a terminal, and when the demand party detects a processing service demand triggered by a user, the processing task may be created according to the demand triggered by the user.
The processing tasks of the embodiment are used for describing the processing flow required for audio and video processing, and exemplarily, the processing tasks may include at least: and the identifiers of the audio and video streams to be processed and the processing plug-in identifier list. The list of processing plug-in identifiers is a list consisting of identifiers of processing plug-ins required for processing the audio/video stream of the embodiment, the list may include one or more than one processing plug-in identifiers, and the processing plug-in corresponding to each processing plug-in identifier is used for processing the audio/video stream to be processed.
And 202, pulling the audio and video stream corresponding to the identifier of the audio and video stream to be processed.
In one implementation, after determining the identifier of the audio/video stream to be processed, the processing process may pull the audio/video stream corresponding to the identifier of the audio/video stream to be processed through a pull stream thread (Puller).
It should be noted that there may be one or more audio/video streams in this embodiment, and when there are multiple audio/video streams, multiple pullers may be used to perform multi-pull streaming. The Puller may adapt to different input streams, for example, the audio-video stream may include audio-video stream of rtmp protocol, audio-video data specifying audio-video network, or files in mp4, flv format, etc.
In an embodiment, after the step 202 completes the pull flow, the embodiment may further include the following steps:
and decoding the audio and video stream into frame data, and storing the frame data in a shared memory.
In this step, after pulling the stream, the Puller may transfer the pulled audio/video stream to a decoder (Decorder) for decoding, and the Decorder decodes the audio/video stream to obtain frame data, where, for a video, the frame data may be an image frame in rgba format; for audio, the frame data may be in PCM format without compression.
The frame data may then be stored in the shared memory, where the frame data may be marked with a memory offset, i.e., the storage location of the frame data in the shared memory may be first obtained, and then the memory offset of the storage location may be calculated.
And 203, sequentially calling the processing plug-ins corresponding to the processing plug-in identification in the processing plug-in identification list, and performing content processing on the audio and video stream.
In this step, the processing plug-in corresponding to each processing plug-in identifier in the processing plug-in identifier list may be called to perform content processing on the frame data. The processing plug-ins corresponding to the processing plug-in identification in the processing plug-in identification list can work in series, and the output of the previous processing plug-in can be used as the input of the next processing plug-in until all the processing plug-ins are processed.
In one embodiment, the process plug-ins may be various audio or video process plug-ins such as cloud beauty, cloud background replacement, etc. provided by different capability teams. The embodiment supports the assembly of the processing plug-in, and various processing plug-ins can be accessed randomly according to business requirements. In addition, the programming language of the add-in is not limited in this embodiment, and the add-in party of the add-in can freely select Lua, Python, C + + and other languages to program the add-in.
In one embodiment, the list of machining insert identifications may include at least two machining insert identifications, and the at least two machining insert identifications are executed in sequence, step 203 may include the following steps:
step 203-1, sequentially traversing the processing plug-in identifiers in the processing plug-in identifier list according to the execution sequence, calling the processing plug-in corresponding to the processing plug-in identifier for the current processing plug-in identifier, and sending the memory offset of the current frame data to be processed to the processing plug-in.
In this embodiment, if the list of machining plug-in identifiers includes two or more machining plug-in identifiers, the list of machining plug-in identifiers may further mark the execution order of the two or more machining plug-in identifiers. The machining process may sequentially traverse the machining plug-in identifiers in the machining plug-in identifier list according to the execution order.
For the currently traversed processing plug-in identification, the corresponding processing plug-in can be called according to the processing plug-in identification, and then the memory offset of the current frame data to be processed is sent to the processing plug-in.
In one embodiment, the calling process plug-in may identify the corresponding process plug-in based on the Thrift interface (i.e., swift RPC) of the remote procedure call protocol RPC protocol. Of course, in addition to this, the process may also use other negotiated communication protocols to communicate with each process plug-in.
Step 203-2, the processing plug-in reads corresponding frame data from the shared memory according to the memory offset, and processes the frame data.
In this step, after the called processing plugin receives the memory offset of the current frame data to be processed, the memory location of the current frame data to be processed in the shared memory may be determined according to the memory offset, and the corresponding frame data is read from the memory location, and then the content processing is performed on the read frame data by using the processing logic of the current processing plugin.
Step 203-3, storing the processed frame data in the shared memory, continuously traversing the next processing plug-in identifier, calling the corresponding processing plug-in to read the processed frame data from the shared memory for processing, and so on until the processing plug-in identifier in the processing plug-in identifier list is traversed completely.
In this step, after the current processing plug-in performs content processing on the frame data, the processed frame data may be stored in the shared memory. In one implementation, the processed frame data may be stored in an original storage location and overwritten with the original frame data. Meanwhile, after the processed frame data is stored in the shared memory, the current processing plugin can send a notification message to the processing process to notify the processing process that the processing of the current plugin is completed and trigger the processing of the next plugin.
After receiving the notification message, the processing process may determine that the current processing plugin has completed its work, and continue traversing the next processing plugin identifier in the processing plugin identifier list, call the corresponding processing plugin, and send the memory offset of the frame data to the processing plugin, where the processing plugin repeats the flows of step 203-2 and step 203-3 until all the processing plugin identifiers in the processing plugin identifier list are traversed and all the processing plugins corresponding to all the processing plugin identifiers are processed.
In other embodiments, the processed frame data may also be stored in other locations instead of the original storage location, and at this time, the current processing plugin may obtain the memory offset of the processed frame data and carry the memory offset in the notification message. The processing process may extract the memory offset from the notification message and send the memory offset to the next processing plugin when calling the next processing plugin, and the next processing plugin may read the processed frame data according to the memory offset and perform re-processing.
In this embodiment, the frame data is stored in the shared memory, and only the memory offset is transmitted without transmitting the frame data between the process and the plug-in, so that the transmission delay caused by transmitting the frame data between the multi-stage plug-ins can be effectively avoided.
In other embodiments, of course, in embodiments without shared memory, after the decoder decodes the audio/video stream, the decoded frame data may be directly transmitted to the processing plugin corresponding to the first processing plugin identifier in the processing plugin identifier list (where the first processing plugin identifier is the first execution sequence, and the subsequent second processing plugin identifier is similar), after the first processing plugin completes content processing on the frame data, the processed frame data can be transmitted to a processing plug-in corresponding to the second processing plug-in identification of the processing plug-in identification list and processed by the second processing plug-in, the processed frame data is transmitted to a third processing plug-in after the second processing plug-in is processed, and repeating the steps until all the processing plug-ins corresponding to the processing plug-in identification of the processing plug-in identification list are processed, wherein the frame data output by the last processing plug-in is the processing result of the current stage.
In one embodiment, when the number of called process plug-ins is at least two, then the at least two process plug-ins may be deployed in one or more containers of the same physical machine.
In this embodiment, when the number of the required processing inserts is two or more, the processing inserts may be disposed in the same container, each processing insert may have a separate container, or the processing inserts may be disposed in multiple containers, that is, a part of the containers has one processing insert, and a part of the containers has at least two processing inserts, and the specific disposition mode may be determined according to business requirements, which is not limited in this embodiment, but in order to reduce transmission delay, the inserts need to be disposed in the same machine.
Of course, in other implementations, the tooling plug-ins may be deployed in different machines to meet different business requirements.
In this embodiment, a method for processing audio and video in a pipeline is provided, where content processing may be performed on audio and video streams according to a processing task, and specifically, after the processing task is received, stream pulling may be performed according to an identifier of the audio and video stream to be processed in the processing task to obtain the audio and video streams. And then sequentially calling the processing plug-ins corresponding to the processing plug-in identifications in the processing plug-in identification list according to the processing plug-in identification list in the processing task to process the content of the audio and video stream.
In addition, the embodiment can assemble the processing plug-in according to the processing plug-in identification list, and when multiple content processing is needed, only corresponding processing plug-ins need to be added on the production line, so that the form and flexibility of content processing are enriched. And triggering content processing in a processing task mode, and customizing input and output paths and parameters according to actual business requirements.
Example two
Fig. 3 is a flowchart of an embodiment of a method for processing an audio and video provided in an embodiment of the present application, where the embodiment is executed on the basis of the first embodiment, and an audio and video stream in the embodiment may include multiple audio and video streams, and then a processing task may further include: the system comprises one or more mixed flow targets and a post-processing insert identification list corresponding to each mixed flow target, wherein each post-processing insert identification list can comprise one or more processing insert identifications.
After the step 203 of the first embodiment, the present embodiment may further include the following steps:
and 204, aiming at each mixed flow target, merging the frame data which are processed by the content processing and are at the same time in each path of audio and video stream according to the mixed flow target to generate merged frame data.
In this step, after each frame data is processed in step 203, the processed frame data may be mixed and merged to generate merged frame data. In this embodiment, the processing task may further include one or more mixed flow targets, wherein one mixed flow target may be correspondingly merged into one frame of merged frame data.
Illustratively, the blending object may include a merging effect, or a step required for blending, for example, one blending object may be to combine image frames and add a progress bar.
In one embodiment, step 204 may include the steps of:
and step 204-1, calling a mixed flow plug-in corresponding to the mixed flow target.
In this embodiment, in addition to the processing insert, the processing insert may further include a mixed flow insert, each mixed flow insert has a related function description, a corresponding mixed flow insert may be selected according to a mixed flow target to perform mixed flow processing, and the mixed flow insert is used to merge multiple frames of data into one frame of data and perform related image processing.
And 204-2, merging the frame data which are processed at the same time and are in the same time in each path of audio and video stream by adopting the mixed flow plug-in.
In this step, after the mixed flow insert is determined, the mixed flow logic of the mixed flow insert may be used to perform merging processing on multiple pieces of frame data at the same time, so as to generate merged frame data.
In an application scenario with a shared memory, the mixed-flow plug-in can read multiple paths of frame data at the same time from the shared memory and perform mixed-flow processing on the multiple paths of frame data.
In an application scene without a shared memory, the mixed-flow plug-in can receive frame data processed by each audio and video stream through the last processing plug-in, and then combine a plurality of frame data into a single frame data.
And step 205, sequentially calling the processing plug-ins corresponding to the mixed flow target and the processing plug-in identifiers in the post-processing plug-in identifier list, and performing post-processing on the merged frame data.
In this embodiment, for merged frame data obtained after blending, if one or more post-processing insert identifier lists corresponding to the blending target exist in the processing task, it indicates that post-processing needs to be performed on the merged frame data.
Specifically, one post-processing plug-in identification list represents a processing requirement, and one path of audio and video stream is correspondingly output, so that if a plurality of post-processing plug-in identification lists exist, a plurality of paths of audio and video streams can be output. For each post-processing plug-in identification list, the processing plug-ins corresponding to the processing plug-in identifications in the post-processing plug-in identification list may be sequentially called to perform post-processing on the current merged frame data, the process of the post-processing is similar to the process of performing content processing on the frame data before mixing in the reference step 203, and the implementation of the reference step 203 may be referred to, which is not repeated herein.
And step 206, coding the merged frame data after post-processing to generate a target audio and video stream, and performing stream pushing processing on the target audio and video stream.
After the post-processing of the merged frame data is completed, the merged frame data after the post-processing can be encoded and merged to generate target audio and video data, and then the target audio and video data is subjected to stream pushing processing.
In this embodiment, after an audio/video task to be processed is dispatched, a series of audio/video stream operations such as stream pulling, decoding, plug-in processing, mixed flow, plug-in post-processing, encoding, stream pushing and the like are performed, and then the processed audio/video stream is output, so that multiple processing capabilities can be realized by one processing flow, and only one overhead of stream pushing and pulling and encoding is needed, so that delay caused by content processing can be reduced better, for example, the content processing delay can be controlled within 50ms by the scheme of this embodiment.
EXAMPLE III
Fig. 4 is a schematic diagram of an exemplary audio/video processing pipeline provided in the third embodiment of the present application, and this embodiment integrates the first embodiment and the second embodiment to describe a processing process of a multi-channel audio/video stream and a post-processing process after mixing.
Compared with the background technology, the method has the advantages that the content processing framework is redefined, the frame data is like a production line, and the audio and video after content processing is obtained after the frame data is processed through a plurality of procedures. As shown in fig. 4, this example may be performed in a processing node Pod (Pod is the smallest deployable computing unit that can be created and managed in kubernets, which may contain a single container or a small number of multiple containers that work closely together), and the whole content processing process may include several stages of Puller, MediaFlow, composer, Pusher, where the descriptions of the various stages are as follows:
and (3) Puller: the whole stage works to adapt to different inputs, which may be audio and video streams of rtmp protocol, audio and video data for formulating audio and video network, and may also be files of mp4 and flv format. After decoding by the decoder, the output of the Puller stage is unified into frame data.
MediaFlow: as shown in fig. 4, each MediaFlow may define N processing cards, where the input and output are frame data, and frame data is input from the 1 st processing card, processed, and output from the 2 nd processing card, and so on. The order and function of processing the plug-ins can be freely assembled, for example, it is defined that the plug-in a grays the frame data, the plug-in b processes the frame data in black and white, and the output of the MediaFlow after a + b is gray and black and white. The process adopts the thrift rpc to communicate with the plug-in, supports python, c + +, lua and other languages, and has higher freedom for selecting an access party. Meanwhile, the frame data is stored in the shared memory pool, so that only memory offset is transmitted without transmitting the frame data between the process and the plug-in unit, and the transmission delay of the frame data between the multi-stage plug-in units is also avoided.
Compositor: the work of this stage is mixing, defining multiple inputs, and combining all the input frame data into a single frame data, and finally only one output.
Pusher: this stage operates to convert the frame data into a different output form. The input is frame data, and the output can be audio and video stream of rtmp protocol, and can also be files in mp4 and flv format.
As shown in fig. 4, when a task is dispatched, a processing process (i.e., a daemon process in fig. 4) receives the processing task through Pipeline (video stream processing Pipeline service), and then analyzes the processing task to obtain an identifier of an audio/video stream to be processed. And then, in a Puller stage, drawing streams from the source end corresponding to the identifier of each audio/video stream to be processed through a drawing pipeline Puller 1 … Puller N, and decoding the drawn audio/video streams through a Decorder to obtain frame data.
Then, in one mode, the frame data generated by the Puller stage decoding may be stored in the shared memory for subsequent processing plugin to read through the memory offset. In another mode, the Decorder may send Audio or Video frames (Audio/Video, a/V for short) directly to the processing plug-in for processing.
Then, in a MediaFlow stage, each audio/video stream has a corresponding MediaFlow, each MediaFlow includes one or more processing plug-ins (i.e., plug-in 1 … plug-in N in fig. 4, such as a beauty plug-in for beauty, a background segmentation plug-in for background segmentation, etc.), content processing is sequentially performed on an audio frame or a video frame through plug-in 1 … plug-in N, a plurality of processing plug-ins are executed in series, a processing result of a previous processing plug-in can be used as an input of a next processing plug-in, and an output of a last processing plug-in is used as an output of a current MediaFlow.
If the frame data is stored through the shared memory, each processing plug-in can store the processed result into the shared memory, acquire the memory offset of the result in the shared memory and send the memory offset to the processing process, the processing process calls the next processing plug-in, and the next processing plug-in reads the corresponding frame data according to the memory offset to process the frame data, and so on. In one implementation, the same frame data is stored in the same location in the shared memory where the later stored data overwrites the earlier stored data. The shared memory IPC shown in fig. 4 means that containers in a pod share the same IPC namespace, and can share memory communication using the SystemV signaling system or POSIX.
After the MediaFlow of each audio and video stream is completed, the stage of the Compositor is entered, and the Compositor receives the processed frame data, or reads the corresponding frame data from the shared memory and then performs mixed flow processing on the frame data according to the mixed flow target.
After the mixed flow processing is finished, a post-processing MediaFlow stage is entered, in the post-processing stage, merged frame data generated after mixed flow can be post-processed by calling one or more processing plug-ins, then the frame data obtained after post-processing is encoded into an audio and video stream, and the audio and video stream enters a push stream pulser stage for push stream processing.
The above-described process is described below by way of a specific example:
as shown in fig. 5, it is assumed that in a live scene, there are two input streams, i.e., a video stream 1 of the anchor 1 and a video stream 2 of the anchor 2, and the image frames in the rgb format can be obtained by pulling and decoding the streams. In the processing stage, for the image frame of the anchor 1, adopting a beautifying plug-in to carry out beautifying processing treatment, or adopting an avatar plug-in to open an avatar effect; meanwhile, for the image frame of the anchor 2, a beautifying plug-in is also adopted for beautifying processing, or an avatar plug-in is adopted to open an avatar effect; then, performing mixed flow processing on the processed image frame of the anchor 1 and the processed image frame of the anchor 2, and obtaining different composite images according to different mixed flow targets, for example, in fig. 5, one of the mixed flow targets is a composite image and PK strips are added to obtain a composite image frame 1, for example, the composite image frame 1 may be as shown in fig. 5 a; another mixed flow object is to synthesize images and use different layouts or picture sizes to get a synthesized image frame 2, for example, the synthesized image frame 2 can be as shown in fig. 5 b.
For the mixed composite image frame, post-processing treatment can be performed according to different post-processing targets, for example, as shown in fig. 5, for the composite image frame 1, a processing insert can be used to replace the background (i.e., post-processing treatment 1 in fig. 5); other machining inserts may be used for post-machining of the composite image frame 2 (i.e., post-machining process 2 in fig. 5). And then coding the post-processed image frame to generate a composite stream, and publishing the composite stream.
The following beneficial effects can be obtained by the embodiment:
low delay: the delay of the embodiment can be controlled within 50 ms;
plug-ins supporting multiple languages: based on the plug-in setting outside the process, the access side of the plug-in can freely select python, c/c + +, lua, nodejs and other languages to write the plug-in;
supporting the splicing capability: a framework is flexibly applied to various services.
Example four
Fig. 6 is a block diagram of an embodiment of an audio/video processing apparatus provided in the fourth embodiment of the present application, where the apparatus may include the following modules:
a task receiving module 601, configured to receive a processing task, where the processing task is used to describe a processing flow required for audio and video processing, and the processing task at least includes: the method comprises the steps that an identification of audio and video streams to be processed and a processing plug-in identification list are obtained;
a stream pulling module 602, configured to pull the audio/video stream corresponding to the identifier of the audio/video stream to be processed;
and the content processing module 603 is configured to sequentially call the processing plug-ins corresponding to the processing plug-in identifiers in the processing plug-in identifier list, and perform content processing on the audio and video stream.
In one embodiment, the apparatus may further include the following modules:
the decoding module is used for decoding the audio and video stream into frame data after pulling the audio and video stream corresponding to the identifier of the audio and video stream to be processed;
and the storage module is used for storing the frame data in a shared memory.
In one embodiment, the list of machining insert identifications includes at least two machining insert identifications, and an execution order of the at least two machining insert identifications;
the content processing module 603 may include the following sub-modules:
the plug-in calling submodule is used for sequentially traversing the processing plug-in identifiers in the processing plug-in identifier list according to the execution sequence, calling the processing plug-in corresponding to the processing plug-in identifiers aiming at the current processing plug-in identifiers, and sending the memory offset of the current frame data to be processed to the processing plug-in;
the processing submodule is used for reading corresponding frame data from the shared memory by the processing plug-in unit according to the memory offset and processing the frame data;
and the storage submodule is used for storing the processed frame data in the shared memory, continuously traversing the next processing plug-in identifier, calling the corresponding processing plug-in to read the processed frame data from the shared memory for processing, and so on until the processing plug-in identifier in the processing plug-in identifier list is completely traversed.
In one embodiment, the plug-in invocation sub-module is specifically configured to:
and calling the machining plug-in corresponding to the machining plug-in identification based on a thrift interface of a remote procedure call protocol (RPC).
In one embodiment, when the number of called process plug-ins is at least two, then the at least two process plug-ins are deployed in one or more containers of the same physical machine.
In one embodiment, the audio-video stream comprises multiple audio-video streams; the processing task further comprises: one or more mixed flow targets and a post-processing insert identification list corresponding to each mixed flow target;
the apparatus may further include the following modules:
the frame data merging module is used for merging the frame data which are processed by content processing and are at the same time in each path of audio and video stream according to each mixed flow target so as to generate merged frame data;
the post-processing module is used for calling processing plug-ins corresponding to the mixed flow target and the processing plug-in identifiers in the post-processing plug-in identifier list in sequence and carrying out post-processing on the merged frame data;
and the coding plug-flow module is used for coding the merged frame data after post-processing to generate a target audio and video stream and carrying out plug-flow processing on the target audio and video stream.
In an embodiment, the frame data merging module is specifically configured to:
calling a mixed flow insert corresponding to the mixed flow target;
and combining the frame data which is processed at the same time and is processed in each path of audio and video stream by adopting the mixed flow plug-in.
The device provided by the embodiment of the application can execute the method provided by the first embodiment to the third embodiment of the application, and has the corresponding functional modules and beneficial effects of the execution method.
EXAMPLE five
Fig. 7 is a schematic structural diagram of a server according to a fifth embodiment of the present disclosure, as shown in fig. 7, the server includes a processor 710, a memory 720, an input device 730, and an output device 740; the number of the processors 710 in the server may be one or more, and one processor 710 is taken as an example in fig. 7; the processor 710, the memory 720, the input device 730, and the output device 740 in the server may be connected by a bus or other means, and are exemplified by being connected by a bus in fig. 7.
The memory 720 is a computer readable storage medium for storing software programs, computer executable programs, and modules, such as program instructions/modules corresponding to the above-mentioned embodiments in the embodiments of the present application. The processor 710 executes various functional applications of the server and data processing by executing software programs, instructions and modules stored in the memory 720, that is, implements the methods mentioned in the above-described method embodiments.
The memory 720 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the terminal, and the like. Further, the memory 720 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, the memory 720 may further include memory located remotely from the processor 710, which may be connected to the device/terminal/server via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The input device 730 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the server. The output device 740 may include a display device such as a display screen.
EXAMPLE six
The sixth embodiment of the present application further provides a storage medium containing computer-executable instructions, which when executed by a computer processor, are used for executing the method in the above-mentioned method embodiment.
Of course, the storage medium provided in the embodiments of the present application contains computer-executable instructions, and the computer-executable instructions are not limited to the method operations described above, and may also perform related operations in the method provided in any embodiments of the present application.
From the above description of the embodiments, it is obvious for those skilled in the art that the present application can be implemented by software and necessary general hardware, and certainly can be implemented by hardware, but the former is a better embodiment in many cases. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which may be stored in a computer-readable storage medium, such as a floppy disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a FLASH Memory (FLASH), a hard disk or an optical disk of a computer, and includes several instructions for enabling an electronic device (which may be a personal computer, a server, or a network device) to execute the methods described in the embodiments of the present application.
It should be noted that, in the embodiment of the apparatus, the included units and modules are merely divided according to functional logic, but are not limited to the above division as long as the corresponding functions can be implemented; in addition, specific names of the functional units are only used for distinguishing one functional unit from another, and are not used for limiting the protection scope of the application.
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present application and the technical principles employed. It will be understood by those skilled in the art that the present application is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the application. Therefore, although the present application has been described in more detail with reference to the above embodiments, the present application is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present application, and the scope of the present application is determined by the scope of the appended claims.
Claims (10)
1. A method of audio-visual processing, the method comprising:
receiving a processing task, wherein the processing task is used for describing a processing flow required by audio and video processing, and the processing task at least comprises the following steps: the method comprises the steps that an identification of audio and video streams to be processed and a processing plug-in identification list are obtained;
pulling the audio/video stream corresponding to the identifier of the audio/video stream to be processed;
and sequentially calling the processing plug-ins corresponding to the processing plug-in identification in the processing plug-in identification list, and processing the content of the audio and video stream.
2. The method according to claim 1, characterized in that after said pulling of the audio-visual stream corresponding to the identification of the audio-visual stream to be processed, the method further comprises:
decoding the audio and video stream into frame data;
and storing the frame data in a shared memory.
3. The method of claim 2, wherein the list of tooling insert identifications includes at least two tooling insert identifications and an order of execution of the at least two tooling insert identifications;
the step of calling the processing plug-ins corresponding to the processing plug-in identification list in sequence to process the content of the audio and video stream comprises the following steps:
sequentially traversing the processing plug-in identifiers in the processing plug-in identifier list according to the execution sequence, calling the processing plug-in corresponding to the processing plug-in identifiers aiming at the current processing plug-in identifiers, and sending the memory offset of the current frame data to be processed to the processing plug-in;
reading corresponding frame data from the shared memory by the processing plug-in according to the memory offset, and processing the frame data;
storing the processed frame data in the shared memory, continuously traversing the next processing plug-in identification, calling the corresponding processing plug-in to read the processed frame data from the shared memory for processing, and so on until the processing plug-in identification in the processing plug-in identification list is completely traversed.
4. The method according to claim 3, wherein the calling the machining plug-in corresponding to the machining plug-in identifier for the current machining plug-in identifier comprises:
and calling the machining plug-in corresponding to the machining plug-in identification based on a thrift interface of a remote procedure call protocol (RPC).
5. The method of claim 3 or 4, wherein when the number of called tooling plugins is at least two, then the at least two tooling plugins are deployed in one or more receptacles of the same physical machine.
6. The method according to any of claims 2-4, wherein the audio-video stream comprises a plurality of audio-video streams; the processing task further comprises: one or more mixed flow targets and a post-processing insert identification list corresponding to each mixed flow target;
after the processing plug-ins corresponding to the processing plug-in identifications in the processing plug-in identification list are called in sequence and content processing is carried out on the audio and video stream, the method further comprises the following steps:
aiming at each mixed flow target, merging the frame data which are processed by content processing and are in the same time in each path of audio and video flow according to the mixed flow target to generate merged frame data;
sequentially calling processing plug-ins corresponding to the mixed flow target and the processing plug-in identifiers in the post-processing plug-in identifier list, and performing post-processing on the merged frame data;
and coding the merged frame data after post-processing to generate a target audio and video stream, and performing stream pushing processing on the target audio and video stream.
7. The method according to claim 6, wherein for each mixed flow target, merging the frame data at the same time and after content processing in each audio/video stream according to the mixed flow target, comprises:
calling a mixed flow insert corresponding to the mixed flow target;
and combining the frame data which is processed at the same time and is processed in each path of audio and video stream by adopting the mixed flow plug-in.
8. An apparatus for audio-visual processing, the apparatus comprising:
the task receiving module is used for receiving a processing task, the processing task is used for describing a processing flow required by audio and video processing, and the processing task at least comprises the following steps: the method comprises the steps that an identification of audio and video streams to be processed and a processing plug-in identification list are obtained;
the stream pulling module is used for pulling the audio and video stream corresponding to the identifier of the audio and video stream to be processed;
and the content processing module is used for sequentially calling the processing plug-ins corresponding to the processing plug-in identification in the processing plug-in identification list and carrying out content processing on the audio and video stream.
9. A server, characterized in that the server comprises:
one or more processors;
a storage device for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-7.
10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 7.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202011643037.8A CN112866814A (en) | 2020-12-30 | 2020-12-30 | Audio and video processing method and device |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202011643037.8A CN112866814A (en) | 2020-12-30 | 2020-12-30 | Audio and video processing method and device |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN112866814A true CN112866814A (en) | 2021-05-28 |
Family
ID=76000882
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202011643037.8A Pending CN112866814A (en) | 2020-12-30 | 2020-12-30 | Audio and video processing method and device |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN112866814A (en) |
Citations (11)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20100080283A1 (en) * | 2008-09-29 | 2010-04-01 | Microsoft Corporation | Processing real-time video |
| US20130276048A1 (en) * | 2012-04-12 | 2013-10-17 | Google Inc. | Live streaming video processing |
| CN103679800A (en) * | 2013-11-21 | 2014-03-26 | 北京航空航天大学 | System for generating virtual scenes of video images and method for constructing frame of system |
| US20180210694A1 (en) * | 2017-01-26 | 2018-07-26 | Gibson Brands, Inc. | Plug-in load balancing |
| CN109491718A (en) * | 2018-09-13 | 2019-03-19 | 北京米文动力科技有限公司 | A kind of plug-in loading method and equipment |
| CN110569083A (en) * | 2019-08-07 | 2019-12-13 | 上海联影智能医疗科技有限公司 | Image segmentation processing method, device, computer equipment and storage medium |
| CN110708609A (en) * | 2019-08-05 | 2020-01-17 | 青岛海信传媒网络技术有限公司 | Video playing method and device |
| CN111083561A (en) * | 2019-12-31 | 2020-04-28 | 深圳市商汤科技有限公司 | Video processing method, device, equipment and storage medium |
| CN111147801A (en) * | 2019-12-31 | 2020-05-12 | 视联动力信息技术股份有限公司 | Video data processing method and device for video networking terminal |
| CN111562945A (en) * | 2020-04-01 | 2020-08-21 | 杭州博雅鸿图视频技术有限公司 | Multimedia processing method, device, equipment and storage medium |
| CN112040271A (en) * | 2020-09-04 | 2020-12-04 | 杭州七依久科技有限公司 | Cloud intelligent editing system and method for visual programming |
-
2020
- 2020-12-30 CN CN202011643037.8A patent/CN112866814A/en active Pending
Patent Citations (11)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20100080283A1 (en) * | 2008-09-29 | 2010-04-01 | Microsoft Corporation | Processing real-time video |
| US20130276048A1 (en) * | 2012-04-12 | 2013-10-17 | Google Inc. | Live streaming video processing |
| CN103679800A (en) * | 2013-11-21 | 2014-03-26 | 北京航空航天大学 | System for generating virtual scenes of video images and method for constructing frame of system |
| US20180210694A1 (en) * | 2017-01-26 | 2018-07-26 | Gibson Brands, Inc. | Plug-in load balancing |
| CN109491718A (en) * | 2018-09-13 | 2019-03-19 | 北京米文动力科技有限公司 | A kind of plug-in loading method and equipment |
| CN110708609A (en) * | 2019-08-05 | 2020-01-17 | 青岛海信传媒网络技术有限公司 | Video playing method and device |
| CN110569083A (en) * | 2019-08-07 | 2019-12-13 | 上海联影智能医疗科技有限公司 | Image segmentation processing method, device, computer equipment and storage medium |
| CN111083561A (en) * | 2019-12-31 | 2020-04-28 | 深圳市商汤科技有限公司 | Video processing method, device, equipment and storage medium |
| CN111147801A (en) * | 2019-12-31 | 2020-05-12 | 视联动力信息技术股份有限公司 | Video data processing method and device for video networking terminal |
| CN111562945A (en) * | 2020-04-01 | 2020-08-21 | 杭州博雅鸿图视频技术有限公司 | Multimedia processing method, device, equipment and storage medium |
| CN112040271A (en) * | 2020-09-04 | 2020-12-04 | 杭州七依久科技有限公司 | Cloud intelligent editing system and method for visual programming |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US10924783B2 (en) | Video coding method, system and server | |
| US10110936B2 (en) | Web-based live broadcast | |
| CN110430441B (en) | Cloud mobile phone video acquisition method, system, device and storage medium | |
| EP3734974B1 (en) | Method and apparatus for processing video bitstream, network device, and readable storage medium | |
| CN116962742A (en) | Live video image data transmission method, device and live video system | |
| CN112804471A (en) | Video conference method, conference terminal, server and storage medium | |
| CN112929681A (en) | Video stream image rendering method and device, computer equipment and storage medium | |
| JP2022523441A (en) | Parameter set signaling in digital video | |
| CN112839239B (en) | Audio and video processing method and device and server | |
| WO2024082561A1 (en) | Video processing method and apparatus, computer, readable storage medium, and program product | |
| CN111147801A (en) | Video data processing method and device for video networking terminal | |
| CN101316352B (en) | Method and device for realizing multi-picture in video conference system, video gateway and method for realizing it | |
| CN112788349B (en) | Data stream pushing method, system, computer equipment and readable storage medium | |
| CN111541905B (en) | Live broadcast method and device, computer equipment and storage medium | |
| CN114157919A (en) | Data processing method and system, cloud terminal, server and computing equipment | |
| CN112866814A (en) | Audio and video processing method and device | |
| CN110337045B (en) | System for adding two-dimensional code to video source and two-dimensional code adding method | |
| CN118870105B (en) | Android-based frame-reducing screen recording method, system and storage medium | |
| US20140376641A1 (en) | Picture Referencing Control for Video Decoding Using a Graphics Processor | |
| CN110769241B (en) | Video frame processing method and device, user side and storage medium | |
| US10659826B2 (en) | Cloud streaming service system, image cloud streaming service method using application code, and device therefor | |
| CN114615458B (en) | Method and device for real-time screen closing and rapid drawing in cloud conference, storage medium and server | |
| CN107426611B (en) | multi-path output method and system based on video transcoding | |
| CN113453010B (en) | Processing method based on high-performance concurrent video real-time processing framework | |
| CN111405233B (en) | Encrypted graph transmission method, device, storage medium and system in video conference |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210528 |
|
| RJ01 | Rejection of invention patent application after publication |