CN119031194B

CN119031194B - Video recording device and audio and video synchronous output method

Info

Publication number: CN119031194B
Application number: CN202411455041.XA
Authority: CN
Inventors: 刘健; 陈剑
Original assignee: Hangzhou Hikvision Digital Technology Co Ltd
Current assignee: Hangzhou Hikvision Digital Technology Co Ltd
Priority date: 2024-10-17
Filing date: 2024-10-17
Publication date: 2024-12-31
Anticipated expiration: 2044-10-17
Also published as: CN119031194A

Abstract

The embodiment of the application provides video equipment and an audio and video synchronous output method, which relate to the technical field of audio and video, wherein the video equipment acquires and analyzes original audio and video data of target image acquisition equipment indicated by an output instruction to obtain target video data, acquires target audio data of equipment bound by the target image acquisition equipment, synchronously outputs the target audio data and the target video data, switches the equipment bound by the target image acquisition equipment into the equipment indicated by the switching instruction when receiving the switching instruction, acquires the audio data of the equipment indicated by the switching instruction, synchronously plays the target video data and the new target audio data as new target audio data, and synchronously outputs the target video data and the new target audio data when receiving a confirmation instruction for the target video data and the new target audio data to obtain the target audio and video data. By applying the scheme provided by the embodiment of the application, the evidence obtaining flow can be simplified.

Description

Video recording equipment and audio and video synchronous output method

Technical Field

The application relates to the technical field of audio and video, in particular to video equipment and an audio and video synchronous output method.

Background

Traditional single video monitoring systems have difficulty in meeting complex and changeable detection requirements, such as in some non-private public service places of hospitals, bank counters, office halls and the like, or places and environments where information acquisition authorization is obtained, abnormal conditions need to be monitored through video pictures, and conversation contents of clients and staff need to be captured so as to provide comprehensive and objective evidence materials when disputes occur.

In the related art, a traditional audio and video acquisition device such as a video camera is generally adopted to acquire and store audio and video data, so that a worker can conveniently monitor or play back the audio and video data in real time through a video recorder. When some scenes are evidence obtained, a camera shooting the scene where the conversation occurs cannot clearly collect the conversation content, so that the evidence obtained is not comprehensive. For example, a doctor provides a prescription for a patient and includes a medicine A and a medicine B, a pharmacy just has no inventory of the medicine A, a pharmacy staff replaces the medicine A with a medicine C with the same composition and effect, and explains the medicine replacement reason to the patient, when a dispute occurs between the patient and the hospital, the hospital needs to retrieve the audio and video data of the pharmacy for self-evidence, if the audio and video data collected by a camera of the pharmacy are directly retrieved, the staff can only know from the audio and video data that the pharmacy staff has a conversation with the patient, but can not know the conversation content, and only the audio data collected by a pickup device arranged on a medicine window of the pharmacy can be respectively retrieved for evidence retrieval, and the evidence retrieval flow is complex.

Disclosure of Invention

The embodiment of the application aims to provide video equipment and an audio and video synchronous output method so as to simplify the evidence obtaining flow. The specific technical scheme is as follows:

In a first aspect of the embodiment of the present application, a video recording device is provided, where the video recording device is configured to obtain, in response to a presentation instruction, original audio/video data from a target image capturing device indicated by the presentation instruction, parse the original audio/video data to obtain target video data and first audio data, obtain second audio data from a device bound by the target image capturing device as target audio data, and synchronize presentation of the target video data and the target audio data with the same timestamp, where the video recording device is previously established with at least one sound pickup device and is previously established with at least one image capturing device, and the device bound by the target image capturing device is the target image capturing device or the first sound pickup device, and the first sound pickup device is one sound pickup device of the at least one sound pickup device;

The video recording device is further used for responding to a switching instruction, stopping executing the step of acquiring second audio data from the equipment bound by the target image acquisition device, switching the equipment bound by the target image acquisition device into equipment indicated by the switching instruction, acquiring third audio data from the equipment bound by the target image acquisition device as new target audio data, and synchronously displaying the target video data with the same time stamp and the new target audio data, wherein the equipment indicated by the switching instruction is the target image acquisition device or a second pickup device, and the second pickup device is one pickup device except the first pickup device in the at least one pickup device;

the video recording device is further used for responding to the confirmation instruction aiming at the target video data and the new target audio data, and synchronously outputting the target video data and the new target audio data to obtain target audio-video data.

In a second aspect of the embodiment of the present application, an audio and video synchronization output method is provided, and is applied to a video recording device, where the method includes:

Analyzing the original audio-video data to obtain target video data and first audio data, and obtaining second audio data from equipment bound by the target image acquisition equipment as target audio data to synchronously display the target video data and the target audio data with the same time stamp;

Responding to a switching instruction, terminating the step of acquiring second audio data from the equipment bound by the target image acquisition equipment, switching the equipment bound by the target image acquisition equipment into the equipment indicated by the switching instruction, and acquiring third audio data from the second equipment bound by the target image acquisition equipment as new target audio data;

synchronously displaying the target video data and the target audio data with the same time stamp;

Responding to a confirmation instruction aiming at the target video data and the new target audio data, and synchronously outputting the target video data and the new target audio data to obtain target audio-video data;

The video recording equipment is in communication connection with at least one pickup equipment in advance, and in communication connection with at least one image acquisition equipment in advance, equipment bound by the target image acquisition equipment is the target image acquisition equipment or first pickup equipment, equipment indicated by a switching instruction is the target image acquisition equipment or second pickup equipment, the first pickup equipment is one pickup equipment in the at least one pickup equipment, and the second pickup equipment is one pickup equipment except the first pickup equipment in the at least one pickup equipment.

The embodiment of the application has the beneficial effects that:

According to the video equipment and the audio-video synchronous output method, after receiving a display instruction, the video equipment acquires original audio-video data of target image acquisition equipment indicated by the output instruction and analyzes the original audio-video data to obtain target video data and first audio data, acquires second audio data of equipment bound by the target image acquisition equipment as target audio data, synchronously outputs the target audio data and the target video data, and when receiving a switching instruction, does not output the second audio data of the equipment bound by the target image acquisition equipment, but switches the equipment bound by the target image acquisition equipment to equipment indicated by the switching instruction, acquires third audio data of the equipment indicated by the switching instruction, synchronously plays the target video data and the new target audio data as new target audio data, and synchronously outputs the target video data and the new target audio data when receiving a confirmation instruction for the target video data and the new target audio data to obtain the target audio-video data. According to the embodiment of the application, when evidence is obtained from audio and video data, as the source of the audio data can be automatically selected and switched according to the requirement, clear and accurate audio data can be obtained only by switching the audio source when the audio is not clear enough, and further the target audio and video data with clear audio can be output and obtained, so that the evidence obtaining process is simplified.

Of course, it is not necessary for any one product or method of practicing the application to achieve all of the advantages set forth above at the same time.

Drawings

In order to more clearly illustrate the embodiments of the application or the technical solutions in the prior art, the drawings used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the application, and other embodiments may be obtained according to these drawings to those skilled in the art.

FIG. 1a is an exemplary diagram of a connection relationship between a conventional video recording device and a sound pickup device, and between a conventional video recording device and an image pickup device;

fig. 1b is an exemplary diagram of a connection relationship between a video recording device, a sound pickup device, and an image capturing device according to an embodiment of the present application;

FIGS. 2a-1, 2a-2, and 2a-3 are exemplary diagrams of a video recording device according to an embodiment of the present application, in which a static binding relationship is established between a pickup device and an image capturing device at a GUI end;

FIGS. 2b-1, 2b-2, and 2b-3 are exemplary diagrams of a video recording device according to an embodiment of the present application establishing a static binding relationship between a pickup device and an image capturing device at a Web end;

fig. 3 is an exemplary diagram of establishing a static binding relationship between a sound pickup apparatus and an image acquisition apparatus and performing data synchronization output according to an embodiment of the present application;

FIG. 4 is a flowchart of a video recording device according to an embodiment of the present application for performing data synchronization output under static binding;

FIGS. 5a and 5b are diagrams illustrating previewing performed by a video recording device according to an embodiment of the present application under dynamic binding of a GUI end interface;

FIG. 6 is a schematic diagram of a first configuration of a video recording apparatus according to an embodiment of the present application;

FIG. 7 is an internal interaction diagram of a video recording device according to an embodiment of the present application when the video recording device previews under dynamic binding of a GUI end;

FIG. 8a is a flowchart of previewing a video recording device according to an embodiment of the present application under dynamic binding at a GUI end;

FIG. 8b is a flowchart of switching audio sources to a camera itself when the video recording device according to the embodiment of the present application previews in a GUI end dynamic binding;

FIG. 8c is a flowchart of switching audio sources to other pickup devices when the video recording device according to the embodiment of the present application previews in dynamic binding at the GUI end;

FIG. 9 is an exemplary diagram of a video recording device previewing under dynamic binding of a Web-side interface according to an embodiment of the present application;

FIG. 10 is an interaction diagram of a video recording device according to an embodiment of the present application when previewing is performed under dynamic binding of a Web terminal;

FIG. 11a is a flowchart of previewing a video recording device according to an embodiment of the present application under dynamic binding of a Web terminal;

FIG. 11b is a flowchart of switching an audio source to a camera itself when a video recording device provided by an embodiment of the present application previews under dynamic binding of a Web end;

FIG. 11c is a flowchart of switching audio sources to other pickup devices when the video recording device according to the embodiment of the present application previews under dynamic binding of the Web terminal;

FIG. 12 is an exemplary diagram of playback performed by a video recording device according to an embodiment of the present application under dynamic binding of a GUI end interface;

FIG. 13 is a schematic diagram of a second configuration of a video recording apparatus according to an embodiment of the present application;

FIG. 14 is an internal interaction diagram of a video recording device according to an embodiment of the present application when the video recording device plays back under the dynamic binding of a GUI end;

FIG. 15a is a flowchart of playback performed by a video recording device according to an embodiment of the present application under dynamic binding at a GUI end;

FIG. 15b is a flowchart of switching audio sources to a camera itself when a video recording device according to an embodiment of the present application plays back with dynamic binding at a GUI end;

Fig. 15c is a flowchart of switching an audio source to another pickup device when the video recording device according to the embodiment of the present application plays back with GUI-side dynamic binding;

FIG. 16 is an interaction diagram of a video recording device according to an embodiment of the present application when playing back the video recording device under dynamic binding of a Web terminal;

FIG. 17a is a flowchart of playback performed by a video recording device according to an embodiment of the present application under dynamic binding of a Web terminal;

FIG. 17b is a diagram illustrating switching of audio sources to a camera itself when a video recording device according to an embodiment of the present application plays back with dynamic binding at a Web end;

FIG. 17c is a diagram illustrating switching of audio sources to other pickup devices when a video recording device according to an embodiment of the present application plays back with dynamic binding on a Web terminal;

FIGS. 18a, 18b, and 18c are exemplary diagrams of exporting video recording devices under dynamic binding of GUI-end interfaces according to embodiments of the present application;

FIG. 19 is a schematic diagram of a third configuration of a video recording apparatus according to an embodiment of the present application;

FIG. 20 is an internal interaction diagram of a video recording device according to an embodiment of the present application when the video recording device is exported under the dynamic binding of a GUI end;

FIG. 21 is a flowchart of exporting video equipment under the dynamic binding of a GUI end provided by an embodiment of the present application;

FIG. 22 is an exemplary diagram of a video recording device according to an embodiment of the present application exported under dynamic binding of a Web-side interface;

FIG. 23 is an interaction diagram of a video recording device according to an embodiment of the present application when export is performed under dynamic binding of a Web terminal;

FIG. 24 is a flowchart of exporting video equipment under dynamic binding of a Web end provided by an embodiment of the present application;

Fig. 25 is a schematic diagram of an audio/video synchronization output method according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. Based on the embodiments of the present application, all other embodiments obtained by the person skilled in the art based on the present application are included in the scope of protection of the present application.

First, terms of art in the embodiments of the present application will be explained:

GUI (GRAPHICAL USER INTERFACE ), commonly referred to as a user interface for desktop applications, includes elements such as windows, buttons, menus, icons, and text boxes, which provide an intuitive way for a user to interact with software.

The Web (World Wide Web) refers to interactive operations such as accessing and using websites or application programs mainly through the internet without installing special application software on a local computer based on browser operation, and data storage and processing are mostly performed on a remote server.

The IOT (Internet of Things ) module is a module for connecting any device with a network through an information sensing device according to a agreed protocol to realize functions such as intelligent identification.

The PAL layer (PERIPHERAL ABSTRACTION LAYER, peripheral abstract layer) is a middle layer between software and hardware, and provides a unified API interface for upper application, so that a developer can not need to care about the specific implementation details of the bottom hardware, and therefore, efficient program development can be performed on different hardware platforms.

An ISAPI (INTERNET SERVER Application Programming Interface, web server application programming interface) module is a dynamic link library for expanding Web server functions.

An API (Application Programming Interface ), a communication protocol, is used to enable interaction and data exchange between two software systems.

An RTSP (REAL TIME STREAMING Protocol, real-time streaming Protocol) module is a module for controlling real-time streaming media transmission based on the real-time streaming Protocol.

A DSP (DIGITAL SIGNAL Processor) module, which is a chip or Processor that implements digital signal processing techniques.

RTP (Real-time Transport Protocol ) encapsulation refers to the packaging of digital audio and video data according to a data packet format defined by the Real-time transport protocol, so as to facilitate network transmission.

In the related art, generally, an image acquisition device capable of acquiring audio and video data is in communication connection with a video recording device, the audio and video data acquired by the image acquisition device is sent to the video recording device for playback or export viewing by a user, but an audio source and a video source in the audio and video data are the same image acquisition device, when the image acquisition device is far away from a person in a video picture, effective audio cannot be acquired, and if evidence is required, only the video data acquired by the image acquisition device and the audio data acquired by a pickup device arranged in the same space with the image acquisition device can be acquired. For example, as shown in fig. 1a, in a certain space, a camera is generally used to capture audio and video data of the space, and then the captured audio and video data is sent to a video recording device, however, the camera can only clearly capture sound within a limited range, for example, a circle where a thin dotted line is located in fig. 1a indicates a range where the camera can clearly capture sound, in this way, the camera can only capture sound of a person having a relatively close distance to the camera, the sound of a person having a relatively far distance to the camera can not be clearly captured by the camera, the capturing range of the capturing device can only be represented by a thick dotted line in fig. 1a, if the behavior and communication content of the person a and the person G need to be obtained, the audio of the communication content of the person a and the person G in the audio and video data captured by the camera may not be clear, the communication content of the person a and the person G cannot be accurately obtained, the audio data captured by the camera in the time period can only be captured by the camera, and the capturing range of the audio data of the camera and the capturing device is combined as evidence-capturing data.

In order to simplify the forensic flow, a first aspect of the embodiment of the present application provides a video recording apparatus, which has been previously established with at least one sound pickup apparatus and with at least one image pickup apparatus, and illustratively, as shown in fig. 1b, a video recording apparatus 10 has been previously established with communication with three sound pickup apparatuses 20 (sound pickup apparatus 20-1, sound pickup apparatus 20-2, sound pickup apparatus 20-3) and two image pickup apparatuses 30 (image pickup apparatus 30-1, image pickup apparatus 30-2);

The video recording device 10 is used for responding to a display instruction, acquiring original audio and video data from a target image acquisition device indicated by the display instruction, analyzing the original audio and video data to obtain target video data and first audio data, acquiring second audio data from a device bound by the target image acquisition device as target audio data, and synchronously displaying the target video data and the target audio data with the same time stamp;

the video recording device 10 is further configured to, in response to a switching instruction, terminate the step of acquiring the second audio data from the device bound by the target image capturing device, switch the device bound by the target image capturing device to the device indicated by the switching instruction, acquire the third audio data from the device bound by the target image capturing device as new target audio data, and synchronously display the target video data with the same timestamp and the new target audio data, where the device indicated by the switching instruction is the target image capturing device or a second sound pickup device, and the second sound pickup device is one sound pickup device other than the first sound pickup device in the at least one sound pickup device;

The video recording device 10 is further configured to, in response to a confirmation instruction for the target video data and the new target audio data, output the target video data and the new target audio data synchronously, so as to obtain target audio/video data;

The video recording equipment is in communication connection with at least one pickup equipment in advance, and in communication connection with at least one image acquisition equipment in advance, wherein equipment bound by the target image acquisition equipment is the target image acquisition equipment or first pickup equipment, and the first pickup equipment is one pickup equipment in the at least one pickup equipment;

It will be appreciated that fig. 1a and 1b are only examples, and in other possible embodiments, the number of pickup devices that are in communication with the video recording device may be one or more, and the number of image capturing devices that are in communication with the video recording device may be one or more. The video recording device can be connected with one image acquisition device and one sound pickup device at the same time, can be connected with two sound pickup devices and four image acquisition devices at the same time, can be connected with four sound pickup devices and five image acquisition devices at the same time, and can be connected with the four sound pickup devices and five image acquisition devices at the same time.

Taking a doctor's office as an example, as shown in fig. 1a, under the condition that information acquisition authorization is obtained, a camera and a pickup device can be arranged in the office, the range of sound which can be effectively acquired by the pickup device is represented by a thick dotted circle, therefore, the pickup device can effectively acquire the sound of personnel A-G around the pickup device in the office, the pickup device is arranged on a doctor's table (i.e. a table in the figure) and is in communication connection with the video device, the acquired audio is sent to the video device, the video device can synchronously output the video data acquired by the camera in the audio and video data and the audio data acquired by the pickup device, and compared with the audio and video data acquired by the camera, the audio and video data with clear audio can be obtained.

According to the video recording equipment provided by the embodiment of the application, when the audio and video data are acquired, the source of the audio data in the video recording equipment can be automatically selected and switched according to the requirements, so that when the audio is not clear enough, clear and accurate audio data can be acquired only by switching the audio source, further the target audio and video data with clear audio can be output and obtained, and the evidence acquisition flow is simplified.

The target image acquisition equipment can be the image acquisition equipment itself or the pickup equipment. When the equipment bound by the target image acquisition equipment is the target image acquisition equipment, the video equipment can directly display the original audio and video data from the target image acquisition equipment.

When the device bound to the target image acquisition device is a sound pickup device, the positions of the sound pickup device and the image acquisition device in the embodiment of the application can be in the same environment or in different environments. For example, the camera may be a camera installed in a hall of a bank when the communication content of the counter of the bank is forensic, the pickup device may be a pickup device installed at the counter, and both the camera and the pickup device may be installed in a doctor's office when the evidence is required to be forensic. It can be understood that, in the embodiment of the present application, hospitals, banks, government offices, etc. are only possible application scenarios of the video recording device provided by the embodiment of the present application, and in other possible embodiments, the video recording device provided by the embodiment of the present application may also be applied to other scenarios where there is a evidence obtaining requirement. The embodiment of the present application is not limited in any way.

The pickup device in the embodiment of the application can be a pickup, a microphone, or other devices with a sound signal collection function, the image collection device refers to devices for capturing dynamic continuous images, such as a video camera, a common camera, a smart phone camera, etc., and the video recording device refers to devices for recording and storing images and sounds, such as a digital video recorder, a network video recorder, etc.

It may be understood that, in the embodiment of the present application, the number of image capturing devices that establish communication connection with the video recording device may be one or more, and the number of pickup devices that establish communication connection with the video recording device may be one or more, which is not limited in this aspect of the present application. For example, a video device is arranged in a government hall, pickup devices are respectively arranged at A, B, C three positions in the hall, cameras are respectively arranged at a position D and a position E, and communication connection is established between the three pickup devices and the two cameras respectively. The specific process of establishing communication connection between the pickup device and the video recording device in the embodiment of the present application is described below, and will not be described here again.

Based on the above, in the embodiment of the present application, a binding relationship may be established between the pickup device and the image capturing device, and the binding relationship may be stored in a storage module of the video recording device in advance, or may be carried in an operation instruction input by a user. The binding relation between the image acquisition device and the pickup device is stored in the storage module of the video recording device in advance, when the display requirement of the video picture of a certain image acquisition device exists, namely when a display instruction is received, the display instruction is not needed to be analyzed again to determine the binding relation, so that the binding relation is called as a static binding relation hereinafter, the binding relation is carried in the display instruction and can be changed along with the different display instructions, and the binding relation is called as a dynamic binding relation hereinafter.

In another example, the user does not perform static binding on the pickup device and the video device, but needs to derive audio data of a certain pickup device and video data of a certain image acquisition device when receiving a display instruction, the binding relationship of the pickup device and the image acquisition device can be included in the display instruction, the display instruction is issued to the video device, and the video device acquires the binding relationship of the pickup device and the image acquisition device from the instruction.

Based on the above, in the embodiment of the present application, the pickup device and the image capturing device may be statically bound one-to-one, or may not be statically bound one-to-one, but when a user has a display requirement, the corresponding instruction includes a binding relationship between the pickup device and the image capturing device, where the display requirement may be any requirement such as preview, playback, export, and the like. It will be appreciated that, in the embodiment of the present application, the pickup device that establishes a communication connection with the video recording device and the image capturing device that establishes a communication connection with the video recording device do not necessarily establish a binding relationship one by one, and as illustrated in the example shown in fig. 1b, it may be that the pickup device 20-1 establishes a binding relationship with the image capturing device 30-1, the pickup device 20-2 establishes a binding relationship with the image capturing device 30-2, and the pickup device 20-3 does not establish a binding relationship with any image capturing device.

The video recording device can receive the original audio and video data from each image acquisition device which is in communication connection with the video recording device, but because the image acquisition device which is in communication connection with the video recording device does not necessarily establish a binding relationship with the pickup device, the video recording device is used for analyzing the original audio and video data from each target image acquisition device aiming at each target image acquisition device involved in the binding relationship to obtain target video data and first audio data, and then acquiring second audio data from the device bound by the target image acquisition device according to the binding relationship of the target image acquisition device to serve as target audio data, and synchronously displaying the target audio data and the target video data. For example, as shown in fig. 1b, assuming that, among the image capturing apparatuses and the sound pickup apparatuses that establish communication connection with the video recording apparatus 10, only the sound pickup apparatus 20-1 establishes a binding relationship with the image capturing apparatus 30-1, after receiving the original audio/video data of the image capturing apparatus 30-1 and the image capturing apparatus 30-2, the video recording apparatus 10, because the binding relationship is established by the image capturing apparatus 30-1 and the sound pickup apparatus 20-1, parses only the original audio/video data from the image capturing apparatus 30-1 to obtain video data corresponding to the image capturing apparatus 30-1, then acquires second audio data from the sound pickup apparatus 20-1, and synchronously displays the video data corresponding to the image capturing apparatus 30-1 and the second audio data of the sound pickup apparatus 20-1 having the same time stamp.

It can be understood that if the second audio data from the device bound by the target image capturing device is not clear enough, the source of the audio data can be switched, so that the video recording device is further configured to not receive the second audio data from the device bound by the target image capturing device according to the switching instruction, switch the device bound by the image capturing device to the device indicated by the switching instruction, and obtain the third audio data of the device, as new target audio data, and synchronously display the target audio data and the target video data with the same time stamp. The video recording device receives and synchronously outputs the video data of the target image acquisition device 1 and the audio data of the pickup device 1 according to the time stamp, and if a switching instruction is received, the video recording device does not receive the audio data acquired by the pickup device 1 any more, but acquires the audio data of the pickup device 2, and synchronously displays the target video data and the audio data from the pickup device 2 according to the time stamp.

Further, if the target video data and the audio data from the sound pickup apparatus 2 can be used as valid evidence, the target video data and the audio data from the sound pickup apparatus 2 can be synchronously output to obtain the target audio-video data.

Based on this, the video recording apparatus is further configured to output the target video data in synchronization with the new target audio data in response to a confirmation instruction of the target video data and the new target audio data.

In order to more clearly describe the video recording apparatus provided by the embodiment of the present application, the video recording apparatus provided by the embodiment of the present application will be described in detail below with reference to the accompanying drawings.

First, a case of one-to-one static binding of a sound pickup apparatus and an image pickup apparatus will be described in detail:

It can be understood that in the practical application process, before the video recording device is used to record the audio and video frames, a user can select the device bound by the target image acquisition device according to the definition of the audio data, and establish the binding relationship between the target image acquisition device and the audio source device, so that the video recording device can continuously display the video data in the original audio and video data acquired by the image acquisition device and the first audio data of the device establishing the binding relationship with the image acquisition device according to the timestamp.

Thus, in one possible implementation manner, the video recording device is further configured to, in response to a binding instruction input by a user, cache a binding relationship between the image capturing device and the sound pickup device indicated by the binding instruction.

The binding relationship between the image acquisition device and the pickup device can be represented by a channel number of the image acquisition device and a channel number of the pickup device, or by a model number and a channel number of the image acquisition device and a model number and a channel number of the pickup device, or by assigning unique identifiers to each image acquisition device and each pickup device in advance, and representing the unique identifiers by an identifier of the image acquisition device and an identifier of the pickup device. This is all possible, and the embodiment of the present application is not limited thereto.

In this embodiment, the original audio/video data and the first audio data may be transmitted in a code stream manner, or may be in other forms than the code stream. Taking a code stream mode as an example, the video recording device analyzes the original audio and video data acquired by the target image acquisition device to obtain a video stream corresponding to the target image acquisition device, then writes the video stream corresponding to the target image acquisition device and a second audio stream of the device bound with the target image acquisition device into the same data buffer zone, encapsulates the video stream and the audio stream in the data buffer zone, and can obtain the target audio and video data and output the target audio and video data.

It may be understood that the static binding relationship between the pickup device and the image capturing device may be established by performing an operation in a GUI interface of the video recording device, or may be performed at a Web terminal, which may be a Web interface of the video recording device, or may be a platform corresponding to the video recording device in the embodiment of the present application, or may be a client terminal for controlling application software of the video recording device in the embodiment of the present application, for convenience of description, hereinafter referred to as a GUI interface of the video recording device. Different sound pickup apparatuses or different image pickup apparatuses are connected to the video recording apparatus through different channels, and thus the description of the sound pickup apparatuses or the image pickup apparatuses can be described with different channel numbers. For example, a video camera is connected to a video recording device through an IPC 02 channel, and then the video camera can be described using IPC 02, and a sound pickup device is connected to the video recording device through an MIC1 channel, and then the sound pickup device can be described using MIC 1.

2 A-1-2 a-3 are exemplary diagrams of the GUI end establishing a binding relationship between the pickup device and the image capturing device, in FIG. 2a-1, the channel number, name, address (i.e. IP address), protocol type, port number, security, channel of the associated image capturing device (i.e. associated camera channel) and model and status of the pickup device are displayed in an audio device management interface, and a user may delete the already accessed pickup device in the interface, may add a new pickup device, and may edit the pickup device accessed to the video device. After clicking the "add" button, the configuration interface is popped up in the current interface, as shown in fig. 2a-2, the user may input the protocol type, IP address, port number, user name and password of the pickup device to be added and select the transmission protocol of the pickup device and the video recording device to access the pickup device to the video recording device in the interface, and the user may select a channel of the image capturing device to be bound in the interface, and whether to start audio timing to perform timing on the pickup device, for example, select the channel of the camera to be bound as "[ D2] camera 1" in fig. 2a-2, and click the "add" button to complete binding after the configuration is completed.

The interface after binding is completed is shown in fig. 2a-3, in which the channel numbers, IP addresses, protocol types, port numbers, password security, channels of the associated image capturing device, and model numbers of the sound capturing devices of all sound capturing devices that have been connected to the video recording device may be displayed. In the GUI interface, the user may click "export" to export information such as the channel number of the pickup device, the port for accessing the video recording device, the model number of the pickup device, the channel number of the bound image capturing device, etc. that establish communication connection with the video recording device into one table.

It can be understood that, when the user establishes the binding relationship between the pickup device and the image acquisition device at the Web end, the video recording device is connected with the remote control device, and the remote control device can be a computer, a smart phone, or other electronic devices with remote control capability.

Fig. 2b-1 to fig. 2b-3 are exemplary diagrams of a binding relationship between a pick-up device and an image capturing device by a Web terminal, in fig. 2b-1, a channel number, an IP address, a protocol type, a port number, password security, a channel of an associated image capturing device and a model of the pick-up device which are connected to a video recording device are all displayed in the Web interface, and a user can delete the connected pick-up device in the Web terminal interface, and can also add or quickly add a new pick-up device. After clicking the "add" button, the configuration interface is popped up in the current interface, as shown in fig. 2b-2, the user may input the IP address, port number, user name password, password confirmation of the pickup device to be added and select the transmission protocol of the pickup device and the video recording device and the protocol type of the pickup device to access the pickup device to the video recording device in the configuration interface, and the user may select the channel of the image capturing device to be bound in the configuration interface and check the pickup device when starting the audio check, for example, select the channel of the camera to be bound as "[ D2] camera 1" in fig. 2b-2, and click the "add" button to complete the binding after the configuration is completed.

The interface after binding is completed is shown in fig. 2b-3, and the information such as the channel number of the pickup device, the type of the device connected to the video device, the state of the pickup device, the port connected to the video device, the channel number of the bound camera, etc. which are in communication connection with the video device can be displayed in the Web interface. The user can synchronously preview the audio data of the pickup device and the video data of the image acquisition device which establish the binding relation in the Web interface.

The above describes the establishment of the static binding relationship between the sound pickup apparatus and the image acquisition apparatus and the data synchronization output from the user operation interface, and the following describes the establishment of the static binding relationship between the sound pickup apparatus and the image acquisition apparatus and the data synchronization output from the module interaction of the video recording apparatus. As shown in fig. 3, when a user establishes a binding relationship (i.e., association relationship) between a pickup device and an image capturing device at a GUI end (or a Web end), a channel number of the image capturing device binding with the pickup device is first input at the GUI end (or the Web end), the GUI module (or the ISAPI module) of the video device issues an association channel to the IOT module, the IOT module stores the association relationship, and issues the association relationship to a PAL layer (i.e., a peripheral abstraction layer) for configuring the association relationship, and channel association management is performed, it is seen that the binding relationship is actually configured by the application layer of the video device. Referring to fig. 3 and 4, the pickup device sends audio data to a download buffer area (i.e., the pickup device download buffer area in fig. 4) of the video device via an RTSP module of the video device, the image capturing device (webcam) sends the audio and video data to a download buffer area (i.e., the webcam download buffer area in fig. 4) of the video device for buffering the audio and video data via an RTSP module of the video device, the DSP module (i.e., the track closing processing unit in fig. 4) of the video device parses the original audio and video data in the download buffer area in the parsing library, recognizes that the video device stores a binding relation related to the image capturing device generating the audio and video data in the process of decoding and encoding the original audio and video data (i.e., reads the binding information in fig. 4), discarding audio data in the original audio-video data, reserving video data, then acquiring the audio data of the pickup device bound with the image acquisition device from a buffer zone of the pickup device, performing RTP encapsulation on the video data of the original audio-video data in a transfer encapsulation library, performing RTP encapsulation on the audio data of the pickup device, combining the video data and the audio data after RTP encapsulation in association channel management to obtain track-closing audio-video data, and encapsulating the track-closing audio-video data in the transfer encapsulation library to obtain and store the combined audio-video data (namely the encoded audio-video data in fig. 4).

The DSP module can respectively store the coded and packaged code stream (namely, target audio and video data) in a video cache area for storage in a hard disk or a network disk, store in a network transmission cache area for remote preview, and store in a local GUI decoding cache area (namely, a local user graphical interface decoding cache area) for GUI decoding display, so that the subsequent user can conveniently play back, export/download the target audio and video.

According to the video recording equipment provided by the embodiment of the application, the one-to-one static binding of the pickup equipment and the image acquisition equipment is established, the audio data in the target audio and video in the video recording equipment is acquired by the equipment bound by the target image acquisition equipment and the target image acquisition equipment, and the equipment bound by the target image acquisition equipment is selected by a user according to the definition of the audio data, so that the video recording equipment can obtain and output the target audio and video data with clear audio without evidence acquisition from the image acquisition equipment and the pickup equipment respectively, and the evidence acquisition flow is simplified.

It will be appreciated that the user may also perform evidence collection without establishing a static binding of the sound pickup device to the image capture device. For example, evidence collection is performed when the target audio/video data is previewed, evidence collection is performed when the audio/video data is played back for a certain period of time, and the like. For convenience of description, the video recording device provided by the embodiment of the present application refers to a process of synchronously outputting video data from a target image capturing device and audio data from a device bound by the target image capturing device without establishing static binding between the pickup device and the image capturing device, and refers to previewing, playback and downloading in this case as dynamic binding, previewing under dynamic binding, playback under dynamic binding and export under dynamic binding.

It can be understood that, similar to the process of establishing the static binding between the pickup device and the image capturing device in the foregoing embodiment, the dynamic binding may be established at the GUI end of the video recording device or at the Web end in the embodiment of the present application. When the Web end establishes dynamic binding, the video equipment also establishes communication connection with the remote control equipment.

The preview under the dynamic binding, the playback under the dynamic binding and the export under the dynamic binding established by the GUI end and the Web end will be described in detail below:

1. previews under dynamic binding

1. Previewing under dynamic binding established at GUI end

After clicking the preview function button in the GUI interface, the user displays an interface as shown in fig. 5a, in which the user may select a previewed image capturing device in a channel, for example, select video data captured by the preview camera 1 or video data captured by the video camera 1, pop up a selection box after clicking an icon of a microphone style as shown in fig. 5b, and the user may select audio in the previewed audio and video in the audio options in the selection box to be audio of the image capturing device itself (i.e., the video camera 1) or audio of the pickup device 1.

In this case, for the video recording device, a preview instruction including a binding relationship issued by a user is actually received, first audio data is acquired from a buffer area corresponding to the sound pickup device according to the binding relationship in the preview instruction, original audio/video data is acquired from the buffer area corresponding to the image acquisition device, then the original audio/video data is parsed to obtain video data corresponding to the target image acquisition device indicated by the preview instruction, and then the first audio data corresponding to the sound pickup device indicated by the preview instruction and the video data corresponding to the target image acquisition device indicated by the preview instruction are synchronously played for the user to preview.

It will be appreciated that the target image capturing apparatus and the sound pickup apparatus indicated by the preview instruction are in communication connection with the video recording apparatus in advance. When the pickup device and the image acquisition device are connected into the video recording device, the video recording device is also used for taking a buffer zone for storing audio and video data from the image acquisition device as a buffer zone corresponding to the image acquisition device and recording the buffer zone in a data buffer zone information base, and taking a buffer zone for storing first audio data from the pickup device as a buffer zone corresponding to the pickup device and recording the buffer zone in the data buffer zone information base.

Wherein audio-video data from different image pickup devices are stored in different buffers, and first audio data from different sound pickup devices are stored in different buffers. Illustratively, assuming that the video recording apparatus establishes communication connection with the image pickup apparatus 1, the image pickup apparatus 2, the sound pickup apparatus 1, and the sound pickup apparatus 2, respectively, the original audio-video data from the image pickup apparatus 1 is buffered in the buffer 1, the original audio-video data from the image pickup apparatus 2 is buffered in the buffer 2, the first audio data from the sound pickup apparatus 1 is buffered in the buffer 3, and the first audio data from the sound pickup apparatus 2 is buffered in the buffer 4.

When receiving the preview instruction, the video recording device is further used for responding to the preview instruction, searching a buffer area corresponding to the image acquisition device indicated by the preview instruction in a data buffer area information base of the video recording device to serve as a first data buffer area, and searching a buffer area corresponding to the pickup device indicated by the preview instruction in a data buffer area information base of the video recording device to serve as a second data buffer area;

And the video recording equipment analyzes the original audio and video data corresponding to the image acquisition equipment indicated by the preview instruction to obtain video data corresponding to the image acquisition equipment indicated by the preview instruction, and synchronously plays the video data corresponding to the image acquisition equipment indicated by the preview instruction and the first audio data corresponding to the pickup equipment indicated by the preview instruction so as to finish the preview.

Through the video recording equipment provided by the embodiment of the application, a user can preview the first audio data corresponding to any pickup equipment and the video data corresponding to any image acquisition equipment at the GUI end, so that the binding relation between the audio pickup equipment and the image acquisition equipment can be conveniently determined according to the preview effect, and further, audio clear audio and video pictures can be obtained.

For more detailed description of previewing under GUI-side dynamic binding, referring to fig. 6, fig. 6 is a diagram illustrating a first structural example of a video recording apparatus according to an embodiment of the present application, where the video recording apparatus includes a codec module 61, a digital signal processing module 62, a storage module 63, a decoding library 64, and a data buffer module 65, the storage module 63 is configured to temporarily store instruction parameters of a preview instruction, and the data buffer module 65 is configured to store audio/video data from an image capturing apparatus and store first audio data from a sound pickup apparatus;

The preview flow under the dynamic binding of the GUI end is shown in fig. 7 and 8a, a user selects a binding relation through a GUI interface, a preview instruction comprising the binding relation is issued to video equipment, a coding and decoding module of the video equipment responds to the preview instruction, the preview is started and sent to a digital signal processing module (namely, the coding and decoding module starts the preview, forwards the binding relation and issues instruction parameters of the preview instruction), the digital signal processing module responds to the preview instruction, a storage module temporarily stores the preview instruction (namely, the digital signal processing module receives the preview instruction and stores the binding relation in the binding information), a processing track-closing relation unit (namely, the digital signal track-closing processing unit) of the digital signal processing module searches a buffer corresponding to an image acquisition device indicated by the preview instruction (namely, searches for an acquisition code stream buffer) in a data buffer module (namely, a code stream buffer information base) as a first data buffer (namely, a network camera code stream buffer), and searches for a buffer corresponding to the video acquisition device indicated by the preview instruction in the data buffer module as a second data buffer (namely, the network camera code stream buffer), the processing device reads the original video data from the first data buffer (namely, the network camera code stream information buffer and analyzes the video stream, reads the original video data from the video stream and analyzes the video data corresponding to the video acquisition device, the method comprises the steps of obtaining a video data and video frame information list after analysis, reading first audio data corresponding to pickup equipment indicated by a preview instruction from a second data buffer zone, and analyzing the first audio data to obtain the first audio data and audio frame information list after analysis, wherein the frame information comprises a frame type, a frame data address, a frame data length and a global time of a frame.

The digital signal processing module reads the video data which is matched with the frame type and the frame data length in the video frame information, reads the resolved first audio data which is matched with the frame type and the frame data length in the audio frame information (namely, reads the code stream data), sends the video data and the resolved first audio data to a decoding library (namely, sends the code stream to the decoding library), and the decoding library decodes and plays the video data and the resolved first audio data.

It may be understood that the decoding library and the DSP module in the embodiment of the present application may be two independent components, or the decoding library may be integrated in the DSP module, and similarly, the decoding library and the DSP module in the playback and export scheme under dynamic binding established at the GUI end may be two independent components, or the decoding library may be integrated in the DSP module, which is not limited in this aspect of the present application.

In one possible embodiment, the decoding library may be integrated into the DSP module, thereby reducing delay caused by data transmission and improving the processing speed of the DSP module for the code stream.

When a user has a switching requirement on an audio source, namely when video equipment receives a switching instruction, if the video equipment needs to be switched to the image acquisition equipment, the video equipment does not read audio data corresponding to equipment bound by the target image acquisition equipment from a second data buffer area and does not analyze original audio and video data corresponding to the target image acquisition equipment indicated by a preview instruction, but plays the original audio and video data corresponding to the target image acquisition equipment indicated by the preview instruction;

If the video recording device is required to switch to the second pickup device, the video recording device does not read the audio data corresponding to the device bound by the target image acquisition device from the second data buffer area, synchronously plays the video data corresponding to the target image acquisition device indicated by the preview instruction and the audio data corresponding to the device bound by the target image acquisition device according to the sequence of the time stamps, searches a third data buffer area corresponding to the second pickup device in a data buffer area information base of the video recording device, reads the third audio data corresponding to the second pickup device from the third data buffer area, and synchronously plays the video data corresponding to the target image acquisition device indicated by the preview instruction and the third audio data according to the sequence of the time stamps.

As shown in fig. 8B, which is a flowchart of switching an audio source to a camera itself when previewing under GUI-side dynamic binding, it is assumed that the current video recording device reads video data from camera a, reads audio data from pickup device B, when switching an audio source to camera a itself, a user switches a binding relationship through a GUI interface, issues a switching instruction including a new binding relationship to the video recording device, a codec module of the video recording device sends the switching instruction to a digital signal processing module (i.e., the codec module forwards the binding relationship and issues instruction parameters of the preview instruction), the digital signal processing module receives the modification, stores the binding relationship, sends a notification of the change of the binding relationship to a processing track-closing relationship unit (i.e., the digital signal track-closing processing unit) of the digital signal processing module, and the digital signal track-closing processing unit re-reads the binding relationship, searches for and obtains a code stream buffer in a code stream buffer information base, and the digital signal track-closing processing unit stops reading the code stream of device B, but reads the audio stream of camera a from the code stream buffer of camera a, sends the video stream to the codec and decodes the code stream to the video code stream database, and decodes the audio stream and the pickup database.

As shown in fig. 8C, a flowchart of switching an audio source to other pickup devices during previewing under GUI-side dynamic binding is shown, assuming that the current video recording device reads video data from camera a, reads audio data from pickup device B, when switching an audio source to pickup device C, a user switches a binding relationship through a GUI interface, issues a switching instruction including a new binding relationship to the video recording device, a codec module of the video recording device sends the switching instruction to a DSP module (i.e., the codec module forwards the binding relationship and issues instruction parameters of the preview instruction), the DSP module receives and modifies the binding relationship, and sends a notification of the change of the binding relationship to a processing track-closing relationship unit (i.e., the DSP processing track-closing relationship unit) of the DSP module, and the DSP processing track-closing relationship unit re-reads the binding relationship and searches for a code stream buffer in a code stream buffer, and stops reading the code stream of device B, but reads the video code stream of camera a from the code stream buffer of camera a, sends the code stream to the decoder and the pickup device C, decodes the code stream, and decodes the code stream from the code stream buffer, and decodes the pickup device C, and analyzes the code stream.

According to the embodiment of the application, the source of the audio data can be replaced according to the actual demand, so that the video equipment can directly output and obtain the audio-clear target audio-video data, and the evidence obtaining process is simplified.

Previewing under dynamic binding established by Web end

The preview of the dynamic binding by the user at the Web end is shown in fig. 9, after clicking the preview function key in the Web interface, the user may select the video data collected by the preview camera 1 or the video data collected by the camera 1 by clicking the "video source", and select the audio channel in the audio and video of the preview by clicking the "audio source" to be the image collecting device itself (i.e. the camera 1) or the pickup device 1.

In this case, as shown in fig. 10, the remote control device actually receives the preview instruction issued by the user and transmits the preview instruction to the video recording device. The video recording device responds to the preview instruction, acquires original audio and video data of the image acquisition device indicated by the preview instruction and first audio data corresponding to the device bound by the target image acquisition device, and sends the original audio and video data and the first audio data to the remote control device.

After receiving the audio data corresponding to the original audio/video data corresponding to the image acquisition device indicated by the preview instruction and the audio data corresponding to the device bound to the target image acquisition device, the remote control device analyzes the original audio/video data corresponding to the target image acquisition device indicated by the preview instruction to obtain the video data corresponding to the target image acquisition device indicated by the preview instruction, and then synchronously plays the video data corresponding to the target image acquisition device indicated by the preview instruction and the first audio data corresponding to the device bound to the target image acquisition device according to the sequence of time stamps.

In a specific embodiment, as shown in fig. 11a, the remote control device includes a play library and a play library track-closing processing unit, where a user selects a binding relationship through a Web interface, that is, sends instruction parameters (that is, parameters issue) of a preview instruction to the play library, the play library stores the binding relationship, and then the play library track-closing processing unit reads the binding relationship from the binding information, and then performs streaming processing to the video device through a protocol, reads code stream data from the video device, that is, pulls an audio/video code stream corresponding to an image acquisition device indicated by the preview instruction and an audio code stream corresponding to a pickup device indicated by the preview instruction from the video device. And then processing the audio and video code stream, reading video data according to the frame type, removing audio data, sending the video code stream data and the audio code stream data to a play library, and decoding and playing by the play library.

It can be seen that the preview of the Web side dynamic binding is similar to the preview of the GUI side dynamic binding, and the difference is that in the process of the preview of the Web side dynamic binding, the play library in the preset terminal or the play library on the server temporarily stores the binding relationship, decodes and plays the video data and the audio data, instead of being completed by the DSP module, and the played audio data and video data are obtained from the video recording device through the protocol, and are not read from the decoding area of the video recording device.

Through the video recording equipment provided by the embodiment of the application, a user can remotely preview the first audio data corresponding to any pickup equipment and the video data corresponding to any image acquisition equipment, so that the binding relation between the audio pickup equipment and the image acquisition equipment can be conveniently determined according to the preview effect, and further, audio clear audio and video pictures can be remotely obtained.

When a user has a switching requirement on an audio source, namely when the remote control equipment receives a switching instruction, if the remote control equipment needs to switch to the image acquisition equipment, the video equipment does not analyze original audio and video data corresponding to the target image acquisition equipment indicated by the preview instruction under the control of the remote control equipment, and audio data corresponding to equipment bound by the target image acquisition equipment is not sent to the remote control equipment, but the original audio and video data corresponding to the target image acquisition equipment indicated by the preview instruction is sent to the remote control equipment, so that the remote control equipment synchronously plays the original audio and video data according to the sequence of time stamps;

If the video recording device needs to be switched to the second pickup device, the video recording device does not send the audio data corresponding to the device bound by the target image acquisition device to the remote control device under the control of the remote control device, but sends the audio data corresponding to the second pickup device to the remote control device, so that the remote control device synchronously plays the video data corresponding to the target image acquisition device indicated by the preview instruction and the audio data corresponding to the second pickup device according to the sequence of the time stamps.

As shown in fig. 11B, a flow chart of switching an audio source to a camera itself when a Web terminal is dynamically bound is shown, assuming that a current video recording device reads video data from a camera a and reads audio data from a pickup device B, when the audio source is switched to the camera a itself, a user switches a binding relationship through a Web interface, issues parameters of a new binding relationship to a play library, the play library modifies and stores the binding relationship, and sends a notification of the binding relationship change, then a play library track closing processing unit firstly re-reads the binding relationship from the binding information, closes a streaming process of an audio stream of the pickup device B from the video recording device, then continues to pull original audio and video data corresponding to the camera a from the video recording device, finally processes a code stream of the original audio and video data, and simultaneously sends the audio and video data to the play library, and decodes and plays the audio and video data by the play library.

As shown in fig. 11C, a flow chart of switching an audio source to other pickup devices during previewing under dynamic binding of a Web terminal is shown, assuming that a current video recording device reads video data from a camera a, reads audio data from a pickup device B, switches a binding relationship through a Web interface when switching the audio source to a pickup device C, issues parameters of a new binding relationship to a play library, modifies and stores the binding relationship, and sends a notification of the binding relationship change, then a play library track closing processing unit reads the binding relationship from the binding information, pulls audio and video code stream data of the camera a and audio code stream data of the pickup device C from the video recording device, closes stream capturing processing of the audio stream of the pickup device B from the video recording device, then reads video data in the audio and video data according to a frame type, eliminates the audio data in the audio and video data, sends the audio code stream data and the audio code stream data of the pickup device C to the play library, and decodes and plays the audio code stream data by the play library.

According to the embodiment of the application, a user can change the source of the audio data according to the actual demand in the process of remote preview, so that video equipment can directly output and obtain the audio-clear target audio-video data, and the evidence obtaining process is simplified.

2. Playback under dynamic binding

1. Playback under dynamic binding established at GUI end

As shown in fig. 12, after clicking the playback function key in the GUI interface, the user may select an image capturing device to be played back, such as the video camera 1, or the video camera 3, in the channel, may select a time when video data to be played back is generated, such as playing back video data generated at 24 th of 2024, and may select, from audio options in the interface, that audio in the audio-visual target of playback is the audio of the video camera 1, or the audio of the sound pickup device 1, or the like.

In this case, for the video recording device, the video recording device actually receives a playback instruction including a binding relationship and a time range issued by a user, reads original audio/video data of the image capturing device indicated by the playback instruction in the time range indicated by the playback instruction from the first data buffer according to the binding relationship in the playback instruction, reads first audio data of the pickup device indicated by the playback instruction in the time range from the second data buffer, stores the read original audio/video data in the first decoding buffer, and stores the read first audio data in the second decoding buffer. The first data buffer area is a buffer area corresponding to the image acquisition equipment indicated by the playback instruction, and the second data buffer area is a buffer area corresponding to the pickup equipment indicated by the playback instruction.

The method comprises the steps of reading original audio and video data from a first decoding buffer zone, analyzing the original audio and video data, storing the analyzed audio and video data in a first code stream analyzing buffer zone, reading first audio data in a playback front from a second decoding buffer zone, analyzing the first audio and video data, storing the analyzed first audio data in a second code stream analyzing buffer zone, finally reading video data, in the time range, of an image acquisition device indicated by a playback instruction from the first code stream analyzing buffer zone, reading first audio data, in the time range, of a pickup device indicated by the playback instruction from the second code stream analyzing buffer zone, and synchronously playing the video data, in the time range, corresponding to the image acquisition device indicated by the playback instruction, and the first audio data, in the time range, corresponding to the pickup device indicated by the playback instruction, for watching and playing by a user.

It will be appreciated that, similar to the GUI-side preview solution described above, the image capturing apparatus and the sound pickup apparatus indicated by the playback instruction are also in communication with the video recording apparatus in advance. When the pickup device and the image acquisition device are connected into the video recording device, the video recording device takes a buffer zone for storing audio and video data from the image acquisition device as a buffer zone corresponding to the image acquisition device and records the buffer zone in a data buffer zone information base, and takes a buffer zone for storing first audio data from the pickup device as a buffer zone corresponding to the pickup device and records the buffer zone in the data buffer zone information base. Wherein audio-video data from different image pickup devices are stored in different buffers, and first audio data from different sound pickup devices are stored in different buffers.

In order to ensure the smoothness and stability of playback, when receiving a playback instruction, the video recording device responds to the playback instruction, allocates a first decoding buffer zone for storing the code stream for the image acquisition device indicated by the playback instruction, and allocates a second decoding buffer zone for storing the code stream for the pickup device.

Through the video recording equipment provided by the embodiment of the application, a user can play back the first audio data corresponding to any pickup equipment and the video data corresponding to any image acquisition equipment at the GUI end to obtain the audio-video picture with clear audio, so that the possibility of insufficient definition of the audio in the played back video is reduced.

For more detailed description of playback under GUI-side dynamic binding, referring to fig. 13, fig. 13 is a diagram illustrating a second structural example of a video recording device according to an embodiment of the present application, where the video recording device includes a playback module 131, a digital signal processing module 132, a data buffer module 133, a decoding library 134, an analysis buffer module 135, and a data buffer information library 136, the analysis buffer module 135 is configured to temporarily store audio/video data of an image capturing device indicated by a playback instruction and first audio data of a sound pickup device indicated by the playback instruction, and the data buffer information library 136 is configured to record a buffer for storing audio/video data of each image capturing device and a buffer for storing first audio data of each sound pickup device.

The playback flow under the dynamic binding of the GUI end is shown in fig. 14 and 15a, the user selects the binding relationship and the playback time through the GUI interface, and issues a playback instruction including the binding relationship and the time range to the video recording device, the playback module of the video recording device responds to the playback instruction and sends the playback instruction to the DSP module (i.e., the digital signal processing module), and the DSP module responds to the playback instruction and applies for storing the first decoding buffer of the audio/video data of the image acquisition device indicated by the playback instruction and the second decoding buffer of the first audio data of the pickup device indicated by the playback instruction in the data buffer module (i.e., the code stream buffer in fig. 15 a). Then, a buffer area corresponding to the image acquisition device indicated by the playback instruction and a buffer area corresponding to the pickup device are searched in a data buffer area information base of the video recording device (i.e. video information of the webcam and video information of the pickup device are searched from a database in fig. 15 a), audio and video data and first audio data are read from the buffer area corresponding to the image acquisition device indicated by the playback instruction and the buffer area corresponding to the pickup device (i.e. code stream data in a hard disk is read in fig. 15 a), and are respectively stored in the first decoding buffer area and the second decoding buffer area.

After the playback module sends the first track-closing instruction to the DSP module, the DSP module reads the original audio/video data from the first decoding buffer area (i.e., the digital signal processing module in fig. 15a reads the audio/video code stream data of the webcam) and reads the first audio data from the second decoding buffer area (i.e., the digital signal processing module in fig. 15a reads the audio code stream data of the pickup device) in response to the first track-closing instruction, and sends the original audio/video data and the first audio data to the decoding library (i.e., the parsing library).

The decoding library analyzes the original audio and video data (namely, the parsing library parses the code stream) to obtain a parsed video data and video frame information list, the parsed video data and video frame information list is sent to and stored in a first parsing buffer area (namely, the code stream parsing buffer area) of the parsing buffer area module, the first audio data is parsed to obtain a parsed first audio data and audio frame information list, and the parsed first audio data and audio frame information list are sent to and stored in a second parsing buffer area of the parsing buffer area module, wherein the frame information comprises frame types, frame data addresses, frame data lengths and global time of frames.

Finally, the DSP module reads video data matched with the frame data address, the frame data length, the frame type and the global time of the frame in the video frame information list from the first analysis buffer area, reads first audio data which is the same as the global time of the frame of the video data and sends the first audio data to a decoding library (namely, the digital signal processing module reads code stream data according to a strategy), and then sends the code stream data to the decoding library for decoding and playing.

If the current audio is not clear in the playback process, when the switching requirement on the audio source exists, that is, when the video recording device receives the switching instruction, if the video recording device needs to switch to the image acquisition device, the video recording device does not read the audio data of the device bound by the target image acquisition device in the time range indicated by the playback instruction from the second data buffer area, and does not store the original audio and video data of the target image acquisition device in the time range indicated by the playback instruction in the first code stream analysis buffer area, but releases the second code stream buffer area, stores the original audio and video data of the target image acquisition device in the time range indicated by the playback instruction in the decoding buffer area for analysis, reads the original audio and video data from the decoding buffer area, and decodes and plays the original audio and video data.

If the second pickup device needs to be switched to, the video recording device releases a second code stream buffer zone, allocates a third code stream buffer zone for storing the code stream for the second pickup device, searches a data buffer zone information base of the video recording device for the third data buffer zone, reads audio data of the second pickup device in a time range indicated by a playback instruction from the third data buffer zone, stores the audio data of the second pickup device in the time range indicated by the playback instruction in the third code stream buffer zone, reads the audio data of the second pickup device in the time range indicated by the playback instruction from the third code stream buffer zone, stores the audio data in a third code stream analysis buffer zone, and decodes and synchronously plays video data of a target image acquisition device indicated by the playback instruction in the time range indicated by the playback instruction and audio data of the second pickup device in the time range indicated by the playback instruction.

As shown in fig. 15B, which is a flowchart of switching audio sources to the camera itself during playback under GUI-side dynamic binding, assuming that the current video recording device reads video data from the camera a and audio data from the pickup device B, when switching audio sources to the camera a itself, the user switches the binding relationship through the GUI interface, and the playback module closes the playback of the pickup device B and releases resources, reads the audio and video code stream data of the camera a from the hard disk, and stores the audio and video code stream data in the code stream buffer. The digital signal processing module reads the audio and video code stream data of the camera A, sends the audio and video data to the analysis library, the analysis library analyzes the audio and video data to obtain an analyzed video data and video frame information list, the analyzed video data and video frame information list is stored in the code stream analysis buffer area, and finally the digital signal processing module reads the video data and the audio data which are matched with the frame data address, the frame data length, the frame type and the global time of the frame in the video frame information list from the code stream analysis buffer area and sends the video data and the audio data to the decoding library for decoding and playing by the decoding library.

Fig. 15C is a flowchart showing that when the GUI end dynamically binds, the audio source is switched to other pickup devices during playback, assuming that the current video recording device reads video data from the camera a, reads audio data from the pickup device B, when the audio source is switched to the pickup device C, the user switches the binding relationship through the GUI interface, the playback module closes the playback of the pickup device B and releases resources, retrieves video information of the pickup device C from the database according to the current video playback time point, reads code stream data of the pickup device C from the hard disk, reads audio and video code stream data of the camera a from the hard disk, and stores the audio and video code stream data of the camera a and the audio code stream data of the pickup device C in the code stream buffer respectively. The digital signal processing module reads the audio and video code stream data of the camera A and the audio code stream data of the pickup device C from the code stream buffer area respectively, the audio and video data are sent to the analysis library, the analysis library analyzes the audio and video data to obtain an analyzed video data and video frame information list, the analyzed video data and video frame information list are stored in the code stream analysis buffer area, the analysis library analyzes the audio data to obtain an analyzed audio data and audio frame information list, the analyzed audio data and audio frame information list are stored in the code stream analysis buffer area, and finally the digital signal processing module reads the video data and audio data (namely, the digital signal processing module reads the code stream data according to a strategy) matched with the frame data address, the frame data length, the frame type and the global time of the frame in the video frame information list from the code stream analysis buffer area and sends the video data and the audio data to the decoding library for decoding and playing by the decoding library.

According to the embodiment of the application, the source of the audio data can be replaced according to the actual demand in the playback process, so that the video equipment can directly output and obtain the audio-clear target audio-video data, and the evidence obtaining process is simplified.

Playback under dynamic binding established by Web end

The playback of the user under the dynamic binding at the Web end is similar to the previous preview of the user under the dynamic binding at the Web end, the user can click a playback function key in the Web interface, select the video data collected by the playback camera 1 or the video data collected by the IPC, and select the audio in the audio option in the interface as the audio of the image collecting device or the audio of the pick-up device MIC 1.

In this case, as shown in fig. 16, the playback instruction issued by the user is actually received by the remote control device, and the playback instruction is transmitted to the video recording device. The video recording device responds to the playback instruction, acquires first audio data of the pickup device indicated by the playback instruction in a time range indicated by the playback instruction and original audio/video data of the image acquisition device indicated by the playback instruction in the time range indicated by the playback instruction, and sends the acquired first audio data and the original audio/video data to the remote control device.

After receiving first audio data of the pickup device indicated by the playback instruction in a time range indicated by the playback instruction and original audio/video data of the image acquisition device indicated by the playback instruction in the time range indicated by the playback instruction, the remote control device analyzes the original audio/video data of the image acquisition device indicated by the playback instruction in the time range indicated by the playback instruction to obtain video data of the image acquisition device indicated by the playback instruction in the time range, and then synchronously plays the video data and the first audio data of the pickup device indicated by the playback instruction in the time range.

In a specific embodiment, as shown in fig. 17a, the remote control device includes a play library, where a user selects a binding relationship through a Web interface, that is, sends an instruction parameter of a playback instruction (that is, the binding relationship) to the play library, and the play library retrieves video information of a video camera from a video recording device through a protocol, pulls the video and audio code stream data from the video recording device, and then stores the video and audio code stream data in a code stream buffer area, retrieves video information of a pickup device from the video recording device, pulls the video and audio code stream data from the video recording device, and stores the video and audio code stream data in the code stream buffer area.

The playing library reads the original audio and video data of the video camera in the time range indicated by the playback instruction from the code stream buffer zone, and the first audio data of the pickup device in the time range indicated by the playback instruction (the playing library reads the audio and video stream data of the video camera, and the playing library reads the audio stream data of the pickup device), and sends the audio and video stream data to the analysis library to analyze the code stream data, and stores the analyzed audio and video stream data in a first analysis buffer zone (namely, a code stream analysis buffer zone) of the video recording device, and stores the analyzed first audio stream data in a second analysis buffer zone (namely, a code stream analysis buffer zone). And then the play library reads the video data matched with the frame data address, the frame data length, the frame type and the global time of the frame in the video frame information list from the code stream analysis buffer area, reads the first audio data which is the same as the global time of the frame of the video data, and decodes and plays the first audio data (namely, the play library reads the code stream data according to the strategy to decode and play).

It can be seen that the playback under the dynamic binding of the Web terminal is similar to the playback under the dynamic binding of the GUI terminal, except that in the playback process under the dynamic binding of the Web terminal, the video data and the audio data are decoded and played by a play library in the preset terminal or a play library on the server, instead of being completed by the DSP module or the play library, and the played audio data and video data are acquired to the video recording device through a protocol, instead of being read from the decoding area of the video recording device.

By the video recording device provided by the embodiment of the application, a user can remotely play back the first audio data corresponding to any pickup device and the video data corresponding to any image acquisition device to obtain the audio-video picture with clear audio, so that the possibility of insufficient definition of the audio in the played back video is reduced.

When a user has a switching requirement on an audio source, namely when the remote control equipment receives a switching instruction, if the remote control equipment needs to switch to the image acquisition equipment, the video equipment does not analyze original audio and video data of a target image acquisition equipment indicated by a playback instruction in a time range indicated by the playback instruction under the control of the remote control equipment, and audio data of equipment bound by the target image acquisition equipment in the time range indicated by the playback instruction is not sent to the remote control equipment, but the original audio and video data corresponding to the target image acquisition equipment indicated by the playback instruction is sent to the remote control equipment, so that the remote control equipment synchronously plays the original audio and video data according to the sequence of time stamps;

If the video recording device needs to switch to the second pickup device, the video recording device does not send the audio data, in the time range indicated by the playback instruction, of the device bound by the target image acquisition device to the remote control device under the control of the remote control device, but sends the audio data, in the time range indicated by the playback instruction, of the second pickup device to the remote control device, so that the remote control device decodes and synchronously plays the video data, in the time range indicated by the playback instruction, of the target image acquisition device and the audio data, in the time range indicated by the playback instruction, of the second pickup device.

According to the embodiment of the application, a user can change the source of the audio data according to the actual demand in the process of remote playback, so that video equipment can directly output and obtain the audio-clear target audio-video data, and the evidence obtaining flow is simplified.

Fig. 17B is a flowchart of switching audio sources to the camera itself when playing back under GUI-side dynamic binding, assuming that the current video recording device reads video data from the camera a, reads audio data from the pickup device B, when switching audio sources to the camera a itself, the user switches binding relations through the GUI interface, the playlist closes playback of the pickup device B and releases resources (i.e., the playlist stops pulling the code stream of the pickup device B), pulls the audio and video code stream data of the camera a from the video recording device, and stores the audio and video code stream data in the code stream buffer. The playing library reads the audio and video code stream data of the camera A, sends the audio and video data to the analysis library, the analysis library analyzes the audio and video data to obtain an analyzed video data and video frame information list, the analyzed video data and video frame information list is stored in the code stream analysis buffer zone, and finally the playing library reads the audio and video code stream data of the camera A from the code stream analysis buffer zone and sends the audio and video code stream data to the decoding library for decoding and playing by the decoding library.

Fig. 17C is a flowchart showing that when the GUI end dynamically binds, the audio source is switched to other pickup devices during playback, assuming that the current video recording device reads video data from the camera a, reads audio data from the pickup device B, when the audio source is switched to the pickup device C, the user switches the binding relationship through the GUI interface, the play library closes the playback of the pickup device B and releases the resource (i.e. the play library stops pulling the code stream of the pickup device B), and according to the current video recording playback time point, retrieves the video recording information of the pickup device C in the video recording device, reads the code stream data of the pickup device C from the video recording device, reads the audio and video code stream data of the camera a from the video recording device, and stores the audio and video code stream data of the camera a and the audio and video code stream data of the pickup device C in the code stream buffer respectively. The method comprises the steps of respectively reading audio and video code stream data of a camera A and audio code stream data of pickup equipment C from a code stream buffer area by a play library, sending the audio and video data to an analysis library, analyzing the audio and video data by the analysis library to obtain an analyzed video data and video frame information list, storing the analyzed video data and video frame information list in a code stream analysis buffer area, analyzing the audio data by the analysis library to obtain an analyzed audio data and audio frame information list, storing the analyzed audio data and audio frame information list in the code stream analysis buffer area, finally reading video data and audio data (namely, decoding and playing the stream data by the play library according to a strategy) matched with frame data addresses, frame data lengths, frame types and global time of frames in the video frame information list from the code stream analysis buffer area, and sending the video data and the audio data to the decoding library to be decoded and played by the decoding library.

It can be understood that, after receiving the switching instruction, when the user plays the video data and the audio data in the time range indicated by the playback instruction, if the user performs the fast forward or the rewind operation, the target audio data and the target video data of the device indicated by the switching instruction are always displayed. For example, the video data from the image capturing device 1 and the audio data from the pickup device 1 at 9 to 10 points on 9 months and 29 days are synchronously played, the switching instruction is received after ten minutes of playing, the device indicated by the switching instruction is the pickup device 2, at this time, the video recording device synchronously plays the video data from the image capturing device 1 and the audio data from the pickup device 2 at 10 to 10 points on 9 months and 29 days, and if the user wants to fall back to the beginning of the video data to replay during playing, the video recording device synchronously plays the video data from the image capturing device 1 and the audio data from the pickup device 2 at 9 to 10 points on 9 months and 29 days.

3. Export under dynamic binding

1. Export under dynamic binding established at GUI end

As shown in fig. 18a, the user clicks "file management" in the GUI interface to enter the video export interface, and by clicking "video", the source of the video data to be exported can be selected in the channel, for example, the image capturing device in fig. 18a that selects the source of the exported video data in the channel as the channel D2, and the start-stop time of the generation of the video data to be exported can be selected, for example, the export of the video data in the time period of 12:22:10-12:23:26. The user may also check "associated audio backup" in the interface to determine whether the derived video data is to be associated with audio.

After selecting the audio-video data to be exported and choosing the associated audio backup, clicking the "backup" button to display the interface as shown in fig. 18b, in which the start-stop time of the video data to be exported is displayed, the user may select the source of audio in the audio-video data to be exported as the audio data from channel D2 in the interface, for example, the associated audio in fig. 18b, and clicking the "ok" to bind the corresponding image capturing device to the pickup device, and displaying the interface as shown in fig. 18 c.

In the interface shown in fig. 18c, a user may select an export path when exporting audio and video data of the audio data of the image capturing device and the pickup device, and the interface may also display information such as a memory space size of the import device, a name, a size, a type, a modification date and the like of the exported audio and video data, and the user may also newly build a folder in the import device during export, and after clicking a "confirm" button, export the audio and video data selected by the user to the import device selected by the user according to a format edited by the user, thereby completing export of the data.

In this case, for the video recording apparatus, it is actually the video recording apparatus that receives the export instruction including the binding relationship issued by the user, and reads, from the first data buffer, the original audio-video data of the image capturing apparatus indicated by the export instruction in the time range indicated by the export instruction according to the binding relationship in the export instruction, and reads, from the second data buffer, the first audio data of the pickup apparatus indicated by the export instruction in the time range indicated by the export instruction. The first data buffer area is a buffer area corresponding to the image acquisition equipment indicated by the export instruction, and the second data buffer area is a buffer area corresponding to the pickup equipment indicated by the export instruction.

In order to ensure the high efficiency and stability of data export, when receiving an export instruction, a video recording device responds to the export instruction, allocates a first decoding buffer for storing a code stream for an image acquisition device indicated by the export instruction, allocates a second decoding buffer for storing the code stream for a pickup device, and allocates a video recording export buffer for storing audio and video data after track closing.

And then the video recording equipment stores the original audio and video data of the image acquisition equipment indicated by the export instruction in the time range indicated by the export instruction into a first decoding buffer zone, and stores the first audio data of the pickup equipment indicated by the export instruction in the time range indicated by the export instruction into a second decoding buffer zone. The method comprises the steps of receiving a first decoding buffer area, receiving a first code stream analysis buffer area, receiving a second decoding buffer area, receiving a second code stream analysis buffer area, receiving a first code stream analysis buffer area, receiving a second code stream analysis buffer area, receiving a first audio data, and storing the first audio data, the second code stream analysis buffer area, the first code stream analysis buffer area is used for storing the first audio data, the second code stream analysis buffer area is used for storing the second audio data, the first code stream analysis buffer area is used for storing the second audio data, the second code stream analysis buffer area is used for storing the second audio data, and the second code stream analysis buffer area is used for storing the second audio data.

It will be appreciated that, similar to the GUI-side preview solution described above, the image capturing device and the sound pickup device indicated by the export instruction are also pre-communicatively connected to the video recording device. When the pickup device and the image acquisition device are connected into the video recording device, the video recording device takes a buffer zone for storing audio and video data from the image acquisition device as a buffer zone corresponding to the image acquisition device and records the buffer zone in a data buffer zone information base, and takes a buffer zone for storing first audio data from the pickup device as a buffer zone corresponding to the pickup device and records the buffer zone in the data buffer zone information base. Wherein audio-video data from different image pickup devices are stored in different buffers, and first audio data from different sound pickup devices are stored in different buffers.

The target storage device may be a storage function device such as a usb disk or a hard disk, and the present application is not limited thereto.

By the video recording equipment provided by the embodiment of the application, a user can export the first audio data corresponding to any pickup equipment and the video data corresponding to any image acquisition equipment at the GUI end to obtain the audio-video picture with clear audio, so that the possibility of insufficient definition of the audio in the exported video is reduced.

In order to more specifically describe export under the GUI end dynamic binding, referring to fig. 19, fig. 19 is a diagram illustrating a third structural example of a video recording apparatus provided by an embodiment of the present application, where the video recording apparatus includes a backup module 191, a digital signal processing module 192, a data buffer module 193, a decoding library 194, an analysis buffer module 195, a data buffer information library 196, and a video export buffer module 197, where the analysis buffer module 195 is used for temporarily storing audio and video data of an image capturing apparatus indicated by an export instruction and first audio data of a pickup apparatus indicated by the export instruction, the data buffer information library 196 is used for recording a buffer for storing audio and video data of each image capturing apparatus and a buffer for storing first audio data of each pickup apparatus, and the video export buffer module is used for tracking video data of the image capturing apparatus indicated by the export instruction in a time range indicated by the export instruction and first audio data of the pickup apparatus indicated by the export instruction in a time range indicated by the export instruction, so as to obtain and store the audio and video data after tracking;

the export flow under GUI-side dynamic binding is shown in figures 20 and 21,

The user selects the binding relation and the export time through the GUI interface, issues an export instruction comprising the binding relation to the video equipment, the backup module of the video equipment responds to the export instruction and sends the export instruction to the DSP module (namely, the digital signal processing module), and the DSP module responds to the export instruction and applies for storing a first decoding buffer zone of audio and video data of the image acquisition equipment indicated by the export instruction and a second decoding buffer zone (namely, a code stream buffer zone in fig. 21) of first audio data of the pickup equipment indicated by the export instruction in the data buffer zone module. Then searching a data buffer area information base of the video equipment for a buffer area corresponding to the image acquisition equipment indicated by the export instruction and a buffer area corresponding to the pickup equipment (namely, in fig. 21, retrieving video information of the network camera and video information of the pickup equipment from the database), reading audio and video data and first audio data from the buffer area corresponding to the image acquisition equipment indicated by the export instruction and the buffer area corresponding to the pickup equipment (namely, reading code stream data in a hard disk in fig. 21), and storing the audio and video data and the first audio data in the first decoding buffer area and the second decoding buffer area respectively.

After the backup module sends the first track closing instruction to the DSP module, the DSP module reads the original audio/video data from the first decoding buffer (i.e. the digital signal processing module in fig. 21 reads the audio/video code stream data of the webcam) in response to the first track closing instruction, reads the first audio data from the second decoding buffer (i.e. the digital signal processing module in fig. 21 reads the audio code stream data of the pickup device), and sends the original audio/video data and the first audio data to the decoding library (i.e. the parsing library).

And finally, the DSP module reads video data matched with the frame data address, the frame data length, the frame type and the global time of the frame in the video frame information list from the first analysis buffer zone, reads first audio data (namely, the DSP module reads code stream data according to a strategy) which is the same as the global time of the frame of the video data, and sends the read video data and the first audio data to the video deriving buffer zone for track combination to obtain the audio and video data after track combination. And finally, the backup module reads the audio and video data after the track combination from the video export buffer area, and performs encapsulation to obtain a video file, and sends the video file to the target storage device.

Export under dynamic binding established by Web end

As shown in fig. 22, the user may select an image capturing device that needs to export video data (remote export, hereinafter referred to as downloading) in a channel of a download interface of the Web side, for example, select the camera 1, select a start-stop time of the downloaded video data, select a source of audio data associated with the downloaded video data, or select audio data not associated with the downloaded video data, and directly use the audio data of the image capturing device itself. The type of file to be downloaded may also be selected among "file types", for example, "all" means downloading audio-video data, "video" means downloading only video data, and "audio" means downloading only audio data.

In this case, as shown in fig. 23, the remote control device actually receives the export instruction issued by the user and transmits the export instruction to the video recording device. The video recording device responds to the export instruction, acquires original audio-video data of the image acquisition device indicated by the export instruction in the time range indicated by the export instruction and first audio data of the pickup device indicated by the export instruction (namely, the pickup device bound with the image acquisition device indicated by the export instruction) in the time range indicated by the export instruction, and sends the read first audio data and the original audio-video data to the remote control device.

After receiving the first audio data of the pickup device indicated by the export instruction in the time range indicated by the export instruction and the original audio/video data of the image acquisition device indicated by the export instruction in the time range indicated by the export instruction, the remote control device analyzes the original audio/video data of the image acquisition device indicated by the export instruction in the time range indicated by the export instruction to obtain video data of the image acquisition device indicated by the export instruction in the time range indicated by the export instruction, then outputs the video data obtained by analysis and the first audio data of the pickup device indicated by the export instruction in the time range indicated by the export instruction as a video file, and finally the backup module sends the video file to the target storage device.

It can be understood that the play library of the remote control device may output the video data corresponding to the image capturing device indicated by the parsed export instruction and the first audio data corresponding to the pickup device indicated by the export instruction as temporary video files, and then generate a final video file in the play library and delete the temporary video file in the play library. The video file may be in MP4 format, AVI format, or other formats.

In a specific embodiment, as shown in fig. 24, the remote control device includes a play library, where a user selects a binding relationship through a Web interface, that is, sends instruction parameters (that is, a binding relationship) of a export instruction to the play library, and the play library retrieves video information of the network camera a from the video recording device, pulls audio and video code stream data of the network camera a, and stores the video information in a first decoding buffer (code stream buffer), retrieves video information of the pickup device from the video recording device, pulls audio code stream data of the pickup device B, and stores the video information in a second decoding buffer (code stream buffer).

The playing library reads the audio and video code stream data of the network camera and the audio code stream data of the pickup device from the code stream buffer, sends the read original audio and video data and the first audio data into the analysis library to analyze the code stream data, outputs the analyzed audio and video code stream data into a temporary video file, and outputs the analyzed first audio code stream data into the temporary video file. And then the playing library reads the code stream data according to the strategy and sends the code stream data to the video guide-out buffer area for track combination, and audio and video data after track combination is obtained. And the play library reads the audio and video data after the track combination from the video export buffer zone through a protocol, and exports the audio and video data into a file after the encapsulation, so as to obtain a video file.

The export under the dynamic binding of the Web end is similar to the export under the dynamic binding of the GUI end, and the difference is that 1. In the export process under the dynamic binding of the Web end, the video data and the audio data are decoded by a play library of a preset terminal or a play library on a server instead of being completed by a DSP module or a play library, 2. In the export process under the dynamic binding of the Web end, the audio data and the video data are acquired from video equipment by a play library of the preset terminal through a protocol instead of being read from a decoding area of the video equipment, and 3. In the export process under the dynamic binding of the Web end, the play library of the preset terminal directly outputs the analyzed data into video files without being cached in an analysis buffer area.

It may be understood that in the embodiment of the present application, the export under the dynamic binding of the Web side and the export under the dynamic binding of the GUI side may be directly exporting the data to be exported as a video file, or may be caching the data to be exported as a temporary file, and then exporting the temporary file, which is not limited in this application.

Through the video recording equipment provided by the embodiment of the application, a user can remotely download or export the first audio data corresponding to any pickup equipment and the video data corresponding to any image acquisition equipment to obtain the audio-video picture with clear audio, so that the possibility of unclear audio is reduced.

In a second aspect of the embodiment of the present application, as shown in fig. 25, an audio and video synchronization output method is provided, and is applied to a video recording device, where the method includes:

step S2501, responding to the display instruction, obtaining original audio-video data from the target image acquisition equipment indicated by the display instruction, analyzing the original audio-video data to obtain target video data and first audio data, obtaining second audio data from the equipment bound by the target image acquisition equipment, and synchronously displaying the target video data and the target audio data with the same time stamp as the target audio data;

Step S2502, responding to the switching instruction, terminating executing the step of acquiring second audio data from the equipment bound by the target image acquisition equipment, switching the equipment bound by the target image acquisition equipment into the equipment indicated by the switching instruction, and acquiring third audio data from the second equipment bound by the target image acquisition equipment as new target audio data;

Step S2503, in response to a confirmation instruction for the target video data and the new target audio data, synchronizing the target video data and the new target audio data to obtain target audio-video data;

The video recording equipment is in communication connection with at least one pickup equipment in advance, and in communication connection with at least one image acquisition equipment in advance, equipment bound by the target image acquisition equipment is target image acquisition equipment or first pickup equipment, equipment indicated by a switching instruction is target image acquisition equipment or second pickup equipment, the first pickup equipment is one pickup equipment in the at least one pickup equipment, and the second pickup equipment is one pickup equipment except the first pickup equipment in the at least one pickup equipment.

According to the audio and video synchronous output method provided by the embodiment of the application, when the audio and video data is subjected to evidence obtaining, the source of the audio data can be automatically selected and switched according to the requirement, so that clear and accurate audio data can be obtained only by switching the source of the audio when the audio is not clear enough, further the target audio and video data with clear audio can be output and obtained, and the evidence obtaining flow is simplified.

In one possible implementation, the presentation instruction is a preview instruction, and the method further includes:

The method comprises the steps of taking a buffer area for storing original audio and video data from an image acquisition device as a buffer area corresponding to the image acquisition device and recording the buffer area in a data buffer area information base, taking the buffer area for storing audio data from a pickup device as the buffer area corresponding to the pickup device and recording the buffer area in the data buffer area information base, wherein the audio and video data from different image acquisition devices are stored in different buffer areas, and the audio data from different pickup devices are stored in different buffer areas;

the method for acquiring the original audio and video data from the target image acquisition equipment indicated by the showing instruction and the second audio data from the equipment indicated by the showing instruction bound by the target image acquisition equipment comprises the following steps:

Searching a first data buffer zone and a second data buffer zone in a data buffer zone information base of video equipment in response to a preview instruction, reading original audio and video data corresponding to target image acquisition equipment indicated by the preview instruction from the first data buffer zone, and reading audio data corresponding to equipment bound by the target image acquisition equipment from the second data buffer zone, wherein the first data buffer zone is a buffer zone corresponding to the target image acquisition equipment indicated by the preview instruction, and the second data buffer zone is a buffer zone corresponding to the equipment bound by the target image acquisition equipment;

Synchronizing presentation time-stamped target video data with target audio data, comprising:

And synchronously playing the video data corresponding to the target image acquisition equipment indicated by the preview instruction and the audio data corresponding to the equipment bound by the target image acquisition equipment according to the sequence of the time stamps.

In one possible implementation, the device indicated by the switching instruction is an image acquisition device, and the method further comprises:

And responding to the switching instruction, stopping executing the step of reading the audio data corresponding to the equipment bound by the target image acquisition equipment from the second data buffer area, analyzing the original audio and video data corresponding to the target image acquisition equipment indicated by the preview instruction, and playing the original audio and video data corresponding to the target image acquisition equipment indicated by the preview instruction.

In a possible embodiment, the device indicated by the switching instruction is a second sound pickup device, and the method further comprises:

In response to the switching instruction, terminating execution of reading the audio data corresponding to the equipment bound by the target image acquisition equipment from the second data buffer area, and synchronously playing the video data corresponding to the target image acquisition equipment indicated by the preview instruction and the audio data corresponding to the equipment bound by the target image acquisition equipment according to the sequence of the time stamps, searching a third data buffer area corresponding to the second pickup equipment in a data buffer area information base of the video equipment, and reading the third audio data corresponding to the second pickup equipment from the third data buffer area;

and synchronously playing the video data corresponding to the target image acquisition equipment indicated by the preview instruction and the third audio data according to the sequence of the time stamps.

In one possible implementation, the presentation instruction is a playback instruction, and the method further includes:

The method comprises the steps of responding to a playback instruction, distributing a first code stream buffer for storing a code stream for a target image acquisition device indicated by the playback instruction, distributing a second code stream buffer for storing the code stream for a pickup device, searching a first data buffer and a second data buffer in a data buffer information base of a video recording device, reading original audio and video data of the target image acquisition device indicated by the playback instruction in a time range indicated by the playback instruction from the first data buffer, storing the original audio and video data of the target image acquisition device indicated by the playback instruction in the time range indicated by the playback instruction in the first code stream buffer, reading audio data of the target image acquisition device bound device in the time range indicated by the playback instruction from the second data buffer, and storing the audio data of the target image acquisition device bound device in the time range indicated by the playback instruction in the second code stream buffer, wherein the first data buffer is a buffer corresponding to the target image acquisition device indicated by the playback instruction, and the second data buffer is a buffer corresponding to the target image acquisition device bound by the target image acquisition device;

Reading original audio and video data of a target image acquisition device indicated by a playback instruction in a time range indicated by the playback instruction from a first code stream buffer area, storing the original audio and video data in the first code stream analysis buffer area, reading audio data of a device bound by the target image acquisition device in the time range indicated by the playback instruction from a second code stream buffer area, and storing the audio data in the second code stream analysis buffer area;

analyzing the original audio-video data to obtain target video data and first audio data, wherein the method comprises the following steps:

analyzing the original audio and video data of the target image acquisition equipment indicated by the playback instruction in the time range indicated by the playback instruction to obtain video data of the target image acquisition equipment indicated by the playback instruction in the time range indicated by the playback instruction;

Video data of the target image acquisition equipment indicated by the playback instruction in the time range indicated by the playback instruction are read from the first code stream analysis buffer, and audio data of equipment bound by the target image acquisition equipment in the time range indicated by the playback instruction are read from the second code stream analysis buffer;

And decoding and synchronously playing the video data, indicated by the playback instruction, of the target image acquisition equipment in the time range indicated by the playback instruction and the audio data, indicated by the playback instruction, of the equipment bound by the target image acquisition equipment.

In response to the switching instruction, terminating the execution of reading the audio data of the device bound by the target image acquisition device in the time range indicated by the playback instruction from the second data buffer, and storing the original audio and video data of the target image acquisition device in the time range indicated by the playback instruction in the first code stream analysis buffer to release the second code stream buffer;

The method further comprises the steps of:

And reading the original audio and video data from the decoding buffer area, and decoding and playing the original audio and video data.

The method comprises the steps of responding to a switching instruction, releasing a second code stream buffer zone, distributing a third code stream buffer zone for storing code streams for a second pickup device, searching a third data buffer zone in a data buffer zone information base of video equipment, reading audio data of the second pickup device in a time range indicated by a playback instruction from the third data buffer zone, and storing the audio data of the second pickup device in the time range indicated by the playback instruction into the third code stream buffer zone, wherein the third data buffer zone is a buffer zone corresponding to the second pickup device;

Reading audio data of the second pickup device in a time range indicated by a playback instruction from the third code stream buffer area, and storing the audio data in the third code stream analysis buffer area;

and decoding and synchronously playing the video data of the target image acquisition device indicated by the playback instruction in the time range indicated by the playback instruction and the audio data of the second pickup device in the time range indicated by the playback instruction.

The embodiment of the application also provides electronic equipment, which comprises:

the processor is used for realizing any step of the audio/video synchronous output method when executing the program stored in the memory.

The Memory may include random access Memory (Random Access Memory, RAM) or may include Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the aforementioned processor.

The Processor may be a general-purpose Processor including a central processing unit (Central Processing Unit, CPU), a network Processor (Network Processor, NP), etc., or may be a digital signal Processor (DIGITAL SIGNAL Processor, DSP), application SPECIFIC INTEGRATED Circuit (ASIC), field-Programmable gate array (Field-Programmable GATE ARRAY, FPGA) or other Programmable logic device, discrete gate or transistor logic device, discrete hardware components.

In yet another embodiment of the present application, a computer readable storage medium is provided, in which a computer program is stored, where the computer program is executed by a processor to implement the steps of any one of the above audio/video synchronization output methods.

In yet another embodiment of the present application, a computer program product containing instructions that, when run on a computer, cause the computer to perform any of the audio video synchronization output methods of the above embodiments is also provided.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present application, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, for example, by wired (e.g., coaxial cable, optical fiber, digital Subscriber Line (DSL)), or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, tape), an optical medium (e.g., DVD), or a Solid state disk (Solid STATE DISK, SSD), etc.

The foregoing description is only of the preferred embodiments of the present application and is not intended to limit the scope of the present application. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application are included in the protection scope of the present application.

Claims

1. A video recording device, characterized in that the video recording device is used to, in response to a display instruction, obtain original audio and video data from a target image acquisition device indicated by the display instruction, parse the original audio and video data to obtain target video data and first audio data; obtain second audio data from a device bound to the target image acquisition device as target audio data; and synchronously display the target video data and target audio data with the same timestamp; wherein the video recording device has established a communication connection with at least one sound pickup device in advance, and has established a communication connection with at least one image acquisition device in advance, the device bound to the target image acquisition device is the target image acquisition device or the first sound pickup device, and the first sound pickup device is one of the at least one sound pickup devices;

The video recording device is further used to, in response to a switching instruction, terminate the step of acquiring the second audio data from the device bound to the target image acquisition device, switch the device bound to the target image acquisition device to the device indicated by the switching instruction, acquire the third audio data from the device bound to the target image acquisition device as new target audio data; and synchronously display the target video data and the new target audio data with the same timestamp; wherein the device indicated by the switching instruction is the target image acquisition device or the second sound pickup device, and the second sound pickup device is a sound pickup device other than the first sound pickup device among the at least one sound pickup device;

In response to a confirmation instruction for the target video data and the new target audio data, the target video data and the new target audio data are synchronously output to obtain target audio and video data.

2. The video recording device according to claim 1, characterized in that the display instruction is a preview instruction;

The video recording device is further used to use the buffer for storing the original audio and video data from the image acquisition device as the buffer corresponding to the image acquisition device, and record it in the data buffer information library; use the buffer for storing the audio data from the sound pickup device as the buffer corresponding to the sound pickup device, and record it in the data buffer information library; wherein the audio and video data from different image acquisition devices are stored in different buffers, and the audio data from different sound pickup devices are stored in different buffers;

The video recording device is specifically configured to, in response to a preview instruction, search a first data buffer and a second data buffer in a data buffer information library of the video recording device, read original audio and video data corresponding to a target image acquisition device indicated by the preview instruction from the first data buffer, and read audio data corresponding to a device bound to the target image acquisition device from the second data buffer; wherein the first data buffer is a buffer corresponding to the target image acquisition device indicated by the preview instruction, and the second data buffer is a buffer corresponding to a device bound to the target image acquisition device;

The recording device is specifically used to parse the original audio and video data corresponding to the target image acquisition device indicated by the preview instruction to obtain the video data corresponding to the target image acquisition device indicated by the preview instruction; and synchronously play the video data corresponding to the target image acquisition device indicated by the preview instruction and the audio data corresponding to the device bound to the target image acquisition device in the order of timestamps.

3. The video recording device according to claim 2, characterized in that the device indicated by the switching instruction is the image acquisition device;

The recording device is also used to, in response to a switching instruction, terminate the execution of the step of reading the audio data corresponding to the device bound to the target image acquisition device from the second data buffer, parsing the original audio and video data corresponding to the target image acquisition device indicated by the preview instruction, and playing the original audio and video data corresponding to the target image acquisition device indicated by the preview instruction.

4. The video recording device according to claim 2, characterized in that the device indicated by the switching instruction is the second sound pickup device;

The video recording device is specifically used to, in response to the switching instruction, terminate the execution of the step of reading the audio data corresponding to the device bound to the target image acquisition device from the second data buffer, synchronously play the video data corresponding to the target image acquisition device indicated by the preview instruction and the audio data corresponding to the device bound to the target image acquisition device in the order of timestamps, search the data buffer information library of the video recording device for the third data buffer corresponding to the second sound pickup device, read the third audio data corresponding to the second sound pickup device from the third data buffer; and synchronously play the video data corresponding to the target image acquisition device indicated by the preview instruction and the third audio data in the order of timestamps.

5. The video recording device according to claim 1, characterized in that the display instruction is a preview instruction, and the video recording device also establishes a communication connection with a remote control device;

The video recording device is also used to send the original audio and video data corresponding to the target image acquisition device indicated by the preview instruction and the audio data corresponding to the device bound to the target image acquisition device to the remote control device, so that the remote control device parses the original audio and video data corresponding to the target image acquisition device indicated by the preview instruction, obtains the video data corresponding to the target image acquisition device indicated by the preview instruction, and synchronously plays the video data corresponding to the target image acquisition device indicated by the preview instruction and the audio data corresponding to the device bound to the target image acquisition device in the order of timestamps.

6. The video recording device according to claim 5, characterized in that the device indicated by the switching instruction is the image acquisition device;

The video recording device is also used to terminate the step of parsing the original audio and video data corresponding to the target image acquisition device indicated by the preview instruction and sending the audio data corresponding to the device bound to the target image acquisition device to the remote control device under the control of the remote control device;

The remote control device controls the video recording device in response to a switching instruction, and the control device also plays the original audio and video data synchronously in the order of timestamps in response to the switching instruction.

7. The video recording device according to claim 5, characterized in that the device indicated by the switching instruction is the second sound pickup device;

The video recording device is further used to, under the control of the remote control device, terminate the step of sending the audio data corresponding to the device bound to the target image acquisition device to the remote control device, and send the audio data corresponding to the second sound pickup device to the remote control device;

Among them, the remote control device controls the recording device in response to the switching instruction, and the remote control device also responds to the switching instruction to synchronously play the video data corresponding to the target image acquisition device indicated by the preview instruction and the audio data corresponding to the second sound pickup device in the order of timestamps.

8. The video recording device according to claim 1, wherein the display instruction is a playback instruction;

The video recording device is also used to, in response to a playback instruction, allocate a first code stream buffer for storing code streams to the target image acquisition device indicated by the playback instruction, and allocate a second code stream buffer for storing code streams to the sound pickup device; search for a first data buffer and a second data buffer in a data buffer information library of the video recording device, read from the first data buffer the original audio and video data of the target image acquisition device indicated by the playback instruction within the time range indicated by the playback instruction, and store the original audio and video data of the target image acquisition device indicated by the playback instruction within the time range indicated by the playback instruction in the first code stream buffer; read from the second data buffer the audio data of the device bound to the target image acquisition device within the time range indicated by the playback instruction, and store the audio data of the device bound to the target image acquisition device within the time range indicated by the playback instruction in the second code stream buffer; wherein the first data buffer is a buffer corresponding to the target image acquisition device indicated by the playback instruction, and the second data buffer is a buffer corresponding to the device bound to the target image acquisition device;

The video recording device is specifically used to read the original audio and video data of the target image acquisition device indicated by the playback instruction within the time range indicated by the playback instruction from the first code stream buffer, and store it in the first code stream parsing buffer; read the audio data of the device bound to the target image acquisition device within the time range indicated by the playback instruction from the second code stream buffer, and store it in the second code stream parsing buffer; parse the original audio and video data of the target image acquisition device indicated by the playback instruction within the time range indicated by the playback instruction to obtain the video data of the target image acquisition device indicated by the playback instruction within the time range indicated by the playback instruction;

The video recording device is also used to read from the first code stream parsing buffer the video data of the target image acquisition device indicated by the playback instruction within the time range indicated by the playback instruction, and read from the second code stream parsing buffer the audio data of the device bound to the target image acquisition device within the time range indicated by the playback instruction; and decode and play synchronously the video data of the target image acquisition device indicated by the playback instruction within the time range indicated by the playback instruction and the audio data of the device bound to the target image acquisition device within the time range indicated by the playback instruction.

9. The video recording device according to claim 8, characterized in that the device indicated by the switching instruction is the image acquisition device;

The video recording device is also used to, in response to the switching instruction, terminate the execution of the steps of reading the audio data of the device bound to the target image acquisition device within the time range indicated by the playback instruction from the second data buffer, storing the original audio and video data of the target image acquisition device indicated by the playback instruction within the time range indicated by the playback instruction in the first code stream parsing buffer, and release the second code stream buffer; storing the original audio and video data of the target image acquisition device indicated by the playback instruction within the time range indicated by the playback instruction in the decoding buffer;

The video recording device is specifically used to read the original audio and video data from the decoding buffer, and decode and play the original audio and video data.

10. The video recording device according to claim 8, characterized in that the device indicated by the switching instruction is the second sound pickup device;

The video recording device is further configured to, in response to the switching instruction, release the second code stream buffer and allocate a third code stream buffer for storing the code stream to the second sound pickup device; search for a third data buffer in a data buffer information library of the video recording device; read the audio data of the second sound pickup device within the time range indicated by the playback instruction from the third data buffer, and store the audio data of the second sound pickup device within the time range indicated by the playback instruction in the third code stream buffer; wherein the third data buffer is a buffer corresponding to the second sound pickup device;

The video recording device is further used to read the audio data of the second sound pickup device within the time range indicated by the playback instruction from the third code stream buffer, and store it in the third code stream parsing buffer;

The video recording device is specifically used to decode and synchronously play the video data of the target image acquisition device indicated by the playback instruction within the time range indicated by the playback instruction and the audio data of the second sound pickup device within the time range indicated by the playback instruction.

11. The video recording device according to claim 1, characterized in that the display instruction is a playback instruction, and the video recording device also establishes a communication connection with a remote control device;

The video recording device is also used to send the original video data of the target image acquisition device indicated by the playback instruction within the time range indicated by the playback instruction and the audio data of the device bound to the target image acquisition device within the time range indicated by the playback instruction to the remote control device, so that the remote control device parses the original audio and video data of the target image acquisition device indicated by the playback instruction within the time range indicated by the playback instruction, obtains the video data of the target image acquisition device indicated by the playback instruction within the time range indicated by the playback instruction, and synchronously plays the video data of the image acquisition device indicated by the playback instruction within the time range indicated by the playback instruction and the audio data of the device bound to the target image acquisition device within the time range indicated by the playback instruction.

12. The video recording device according to claim 11, characterized in that the device indicated by the switching instruction is the image acquisition device;

The video recording device is also used to terminate the step of parsing the original audio and video data of the target image acquisition device indicated by the playback instruction within the time range indicated by the playback instruction, and sending the audio data of the device bound to the target image acquisition device within the time range indicated by the playback instruction to the remote control device under the control of the remote control device;

The remote control device controls the video recording device in response to a switching instruction, and the remote control device also decodes and plays the original audio and video data in response to the switching instruction.

13. The video recording device according to claim 11, wherein the device indicated by the switching instruction is the second sound pickup device;

The video recording device is further used to, under the control of the remote control device, terminate the step of sending the audio data of the device bound to the target image acquisition device within the time range indicated by the playback instruction to the remote control device, and send the audio data of the second sound pickup device within the time range indicated by the playback instruction to the remote control device;

Among them, the remote control device controls the recording device in response to the switching instruction, and the remote control device also decodes and synchronously plays the video data of the target image acquisition device indicated by the playback instruction within the time range indicated by the playback instruction and the audio data of the second sound pickup device within the time range indicated by the playback instruction.

14. The recording device according to claim 1 is characterized in that the display instruction is a playback instruction, and the recording device is also used to, in response to a rollback instruction, terminate the execution of the step of synchronously displaying the target video data and the new target audio data with the same timestamp, and synchronously display the target audio data and the target video data from the device indicated by the switching instruction from the time indicated by the rollback instruction.

15. A method for synchronously outputting audio and video, characterized in that it is applied to a video recording device, and the method comprises:

In response to a display instruction, original audio and video data from a target image acquisition device indicated by the display instruction is obtained; the original audio and video data is parsed to obtain target video data and first audio data, and second audio data from a device bound to the target image acquisition device is obtained as target audio data, and the target video data and target audio data having the same timestamp are synchronously displayed;

In response to the switching instruction, the step of acquiring the second audio data from the device bound to the target image acquisition device is terminated, the device bound to the target image acquisition device is switched to the device indicated by the switching instruction, and the third audio data from the second device bound to the target image acquisition device is acquired as the new target audio data;

Synchronously displaying the target video data and the target audio data having the same timestamp;

In response to a confirmation instruction for the target video data and the new target audio data, synchronously outputting the target video data and the new target audio data to obtain target audio and video data;

Among them, the video recording device has pre-established a communication connection with at least one sound pickup device, and has pre-established a communication connection with at least one image acquisition device, the device bound to the target image acquisition device is the target image acquisition device or the first sound pickup device, the device indicated by the switching instruction is the target image acquisition device or the second sound pickup device, the first sound pickup device is a sound pickup device in the at least one sound pickup device, and the second sound pickup device is a sound pickup device in the at least one sound pickup device other than the first sound pickup device.

16. The method according to claim 15, wherein the display instruction is a preview instruction, and the method further comprises:

The buffer used to store the original audio and video data from the image acquisition device is used as the buffer corresponding to the image acquisition device and recorded in the data buffer information library; the buffer used to store the audio data from the sound pickup device is used as the buffer corresponding to the sound pickup device and recorded in the data buffer information library; wherein the audio and video data from different image acquisition devices are stored in different buffers, and the audio data from different sound pickup devices are stored in different buffers;

The obtaining of original audio and video data from the target image acquisition device indicated by the display instruction and second audio data from the device indicated by the display instruction bound to the target image acquisition device includes:

In response to the preview instruction, searching the data buffer information base of the video recording device for a first data buffer and a second data buffer, reading the original audio and video data corresponding to the target image acquisition device indicated by the preview instruction from the first data buffer, and reading the audio data corresponding to the device bound to the target image acquisition device from the second data buffer; wherein the first data buffer is the buffer corresponding to the target image acquisition device indicated by the preview instruction, and the second data buffer is the buffer corresponding to the device bound to the target image acquisition device;

The synchronously displaying the target video data and the target audio data having the same timestamp includes:

The video data corresponding to the target image acquisition device indicated by the preview instruction and the audio data corresponding to the device bound to the target image acquisition device are synchronously played in the order of timestamps.

17. The method according to claim 16, wherein the device indicated by the switching instruction is the image acquisition device; the method further comprises:

In response to the switching instruction, the step of reading the audio data corresponding to the device bound to the target image acquisition device from the second data buffer is terminated, and the original audio and video data corresponding to the target image acquisition device indicated by the preview instruction is parsed, and the original audio and video data corresponding to the target image acquisition device indicated by the preview instruction is played.

18. The method according to claim 16, wherein the device indicated by the switching instruction is the second sound pickup device; the method further comprises:

In response to the switching instruction, the step of reading the audio data corresponding to the device bound to the target image acquisition device from the second data buffer is terminated, the video data corresponding to the target image acquisition device indicated by the preview instruction and the audio data corresponding to the device bound to the target image acquisition device are synchronously played in the order of timestamps, the third data buffer corresponding to the second sound pickup device is searched in the data buffer information library of the video recording device, and the third audio data corresponding to the second sound pickup device is read from the third data buffer;

The video data corresponding to the target image acquisition device indicated by the preview instruction and the third audio data are synchronously played in the order of the timestamps.

19. The method according to claim 15, wherein the display instruction is a playback instruction, and the method further comprises:

In response to a playback instruction, a first code stream buffer for storing code streams is allocated to the target image acquisition device indicated by the playback instruction, and a second code stream buffer for storing code streams is allocated to the sound pickup device; a first data buffer and a second data buffer are searched in a data buffer information library of the video recording device, original audio and video data of the target image acquisition device indicated by the playback instruction within the time range indicated by the playback instruction are read from the first data buffer, and the original audio and video data of the target image acquisition device indicated by the playback instruction within the time range indicated by the playback instruction are stored in the first code stream buffer; audio data of the device bound to the target image acquisition device within the time range indicated by the playback instruction are read from the second data buffer, and the audio data of the device bound to the target image acquisition device within the time range indicated by the playback instruction are stored in the second code stream buffer; wherein the first data buffer is a buffer corresponding to the target image acquisition device indicated by the playback instruction, and the second data buffer is a buffer corresponding to the device bound to the target image acquisition device;

Acquiring original audio and video data from a target image acquisition device indicated by the display instruction and second audio data from a device indicated by the display instruction bound to the target image acquisition device, including:

Reading the original audio and video data of the target image acquisition device indicated by the playback instruction within the time range indicated by the playback instruction from the first code stream buffer, and storing it in the first code stream parsing buffer; reading the audio data of the device bound to the target image acquisition device within the time range indicated by the playback instruction from the second code stream buffer, and storing it in the second code stream parsing buffer;

The parsing of the original audio and video data to obtain target video data and first audio data includes:

Parsing the original audio and video data of the target image acquisition device indicated by the playback instruction within the time range indicated by the playback instruction to obtain the video data of the target image acquisition device indicated by the playback instruction within the time range indicated by the playback instruction;

The acquiring of original audio and video data from the target image acquisition device indicated by the display instruction and second audio data from the device indicated by the display instruction bound to the target image acquisition device includes:

Reading the video data of the target image acquisition device indicated by the playback instruction within the time range indicated by the playback instruction from the first code stream parsing buffer, and reading the audio data of the device bound to the target image acquisition device within the time range indicated by the playback instruction from the second code stream parsing buffer;

The video data of the target image acquisition device indicated by the playback instruction within the time range indicated by the playback instruction and the audio data of the device bound to the target image acquisition device within the time range indicated by the playback instruction are decoded and played synchronously.

20. The method according to claim 19, characterized in that the device indicated by the switching instruction is the image acquisition device; the method further comprises:

In response to the switching instruction, the steps of reading the audio data of the device bound to the target image acquisition device within the time range indicated by the playback instruction from the second data buffer and storing the original audio and video data of the target image acquisition device indicated by the playback instruction within the time range indicated by the playback instruction in the first code stream parsing buffer are terminated, and the second code stream buffer is released; the original audio and video data of the target image acquisition device indicated by the playback instruction within the time range indicated by the playback instruction are stored in the decoding buffer;

The method further comprises:

The original audio and video data is read from the decoding buffer, and the original audio and video data is decoded and played.

21. The method according to claim 19, wherein the device indicated by the switching instruction is the second sound pickup device; the method further comprises:

In response to the switching instruction, the second code stream buffer is released, and a third code stream buffer for storing the code stream is allocated to the second sound pickup device; the third data buffer is searched in the data buffer information library of the video recording device; the audio data of the second sound pickup device within the time range indicated by the playback instruction is read from the third data buffer, and the audio data of the second sound pickup device within the time range indicated by the playback instruction is stored in the third code stream buffer; wherein the third data buffer is the buffer corresponding to the second sound pickup device;

Read the audio data of the second sound pickup device within the time range indicated by the playback instruction from the third code stream buffer, and store it in the third code stream parsing buffer;

The video data of the target image acquisition device indicated by the playback instruction within the time range indicated by the playback instruction and the audio data of the second sound pickup device within the time range indicated by the playback instruction are decoded and played synchronously.