CN111327939A

CN111327939A - Distributed teaching video processing system

Info

Publication number: CN111327939A
Application number: CN202010114831.7A
Authority: CN
Inventors: 张凌; 牟相霖; 高晓东; 李冠霖; 成海秀
Original assignee: South China University of Technology SCUT; CERNET Corp
Current assignee: South China University of Technology SCUT; CERNET Corp
Priority date: 2020-02-25
Filing date: 2020-02-25
Publication date: 2020-06-23

Abstract

The invention discloses a distributed teaching video processing system, comprising a teaching video file transmission module, a teaching video GPU accelerated processing module and a teaching video automatic human eye mosaicing module, each module can be implemented by C++ language corresponding software module, thereby Implement a teaching video processing system that can run on a server. The teaching video file transfer module is used to download the original video file from the video storage server and upload the processed video file. The GPU-accelerated processing module for teaching videos can perform GPU-accelerated processing for video cropping, video transcoding, and video resolution adjustment. The teaching video automatic human eye mosaic module can realize automatic mosaic of the human eyes in the video. The present invention transfers the local video processing to the server for processing, which speeds up the video processing speed, and provides the video automatic human eye mosaic function that the video processing software does not have, which simplifies and facilitates the video processing flow.

Description

A Distributed Teaching Video Processing System

技术领域technical field

本发明涉及视频处理的技术领域，尤其是指一种分布式教学视频处理系统。The invention relates to the technical field of video processing, in particular to a distributed teaching video processing system.

背景技术Background technique

目前，在学习管理系统中，我们通常需要拍摄一些老师讲课的教学视频然后制作成网络课程供学生学习。然而由于拍摄的原始教学视频通常尺寸和分辨率一般都非常大，为了节省学习管理系统中的带宽资源，我们通常需要对这些视频进行一些处理，比如对原始视频进行裁剪，或者调整其分辨率缩小视频大小，或者对视频进行转码等一系列操作。此外，针对一些中医诊断的教学视频，我们还需要对视频进行一些高阶处理，如对人眼打马赛克进而对病人的隐私进行保护。At present, in the learning management system, we usually need to shoot some teaching videos of teachers' lectures and then make them into online courses for students to learn. However, since the original teaching videos shot are usually very large in size and resolution, in order to save bandwidth resources in the learning management system, we usually need to perform some processing on these videos, such as cropping the original video, or adjusting its resolution to reduce Video size, or a series of operations such as transcoding the video. In addition, for some teaching videos of TCM diagnosis, we also need to perform some high-level processing on the videos, such as mosaicking the human eyes to protect the privacy of patients.

面对这些视频处理任务，我们通常会使用一些第三方视频处理软件，比如AdobePremiere等进行处理。然而对于教师而言，这些视频处理任务一般是非常的繁琐和重复的，而且一般这些视频处理软件通常是面向专业者，且功能比较基础，欠缺高阶功能，对一般的使用者不友好，此外，使用者利用自己个人电脑处理这些任务时，严重依赖单机CPU资源，处理效率非常低下。Faced with these video processing tasks, we usually use some third-party video processing software, such as Adobe Premiere, for processing. However, for teachers, these video processing tasks are generally very cumbersome and repetitive, and generally these video processing software are usually oriented to professionals, and their functions are relatively basic, lacking high-level functions, and are not friendly to ordinary users. , when users use their personal computers to process these tasks, they rely heavily on single-machine CPU resources, and the processing efficiency is very low.

发明内容SUMMARY OF THE INVENTION

本发明的目的在于克服现有技术的缺点与不足，提出了一种分布式教学视频处理系统，该系统充分挖掘了GPU的并行处理能力进行教学视频处理，解决了用户本地教学视频处理速度慢，用户本地处理操作繁琐，需要依赖第三方视频处理软件以及提供了第三方视频处理软件不具备的一些视频处理功能。该系统充当一个教学视频处理引擎的角色，一方面可以对传统教学视频处理，比如视频转码、分辨率、裁剪进行GPU加速，另一方面，这个引擎提供了一些高阶的教学视频处理功能，如教学视频中的人眼自动打马赛克等。The purpose of the present invention is to overcome the shortcomings and deficiencies of the prior art, and proposes a distributed teaching video processing system, which fully exploits the parallel processing capability of the GPU for teaching video processing, and solves the problem that the user's local teaching video processing speed is slow. The user's local processing operation is cumbersome, and it needs to rely on third-party video processing software and provide some video processing functions that the third-party video processing software does not have. The system acts as a teaching video processing engine. On the one hand, it can perform GPU acceleration on traditional teaching video processing, such as video transcoding, resolution, and cropping. On the other hand, this engine provides some high-level teaching video processing functions. For example, the human eye in the teaching video automatically plays mosaic and so on.

为实现上述目的，本发明所提供的技术方案如下：一种分布式教学视频处理系统，包括：To achieve the above purpose, the technical solution provided by the present invention is as follows: a distributed teaching video processing system, comprising:

教学视频文件传输模块，用于实现从视频存储服务器中下载原始视频文件，并上传视频处理系统处理后的视频文件；The teaching video file transmission module is used to download the original video file from the video storage server and upload the video file processed by the video processing system;

教学视频GPU加速处理模块，用于实现根据用户输入的视频名称和视频分辨率对视频进行调整分辨率，根据用户输入的视频名和视频格式对视频进行转码，根据用户输入的视频名、起始时间和结束时间对视频进行裁剪；The teaching video GPU accelerated processing module is used to adjust the resolution of the video according to the video name and video resolution input by the user, and transcode the video according to the video name and video format input by the user. Time and end time to crop the video;

教学视频自动人眼打马赛克模块，用于实现将教学视频分解为图像帧，检测图像中的人脸，检测人脸中人眼特征点，对人眼特征点区域进行打马赛克，把打码后的图像帧和音频制作成视频。The teaching video automatic eye mosaic module is used to decompose the teaching video into image frames, detect the human face in the image, detect the human eye feature points in the face, and perform mosaic on the human eye feature point area. image frames and audio into a video.

进一步，所述教学视频文件传输模块包括用户登录模块、Cookie处理模块、视频文件下载模块、视频文件上传模块，其中：Further, the teaching video file transmission module includes a user login module, a cookie processing module, a video file download module, and a video file upload module, wherein:

所述用户登录模块根据用户输入的用户名和密码登录视频存储服务器；The user login module logs in to the video storage server according to the user name and password input by the user;

所述Cookie处理模块负责保存视频存储服务器返回的Cookie到本地以及从本地加载Cookie；The cookie processing module is responsible for saving the cookie returned by the video storage server to the local and loading the cookie from the local;

所述视频文件下载模块负责从视频存储服务器中下载视频文件；The video file downloading module is responsible for downloading video files from the video storage server;

所述视频文件上传模块负责上传处理后的视频到视频存储服务器。The video file uploading module is responsible for uploading the processed video to the video storage server.

进一步，所述教学视频GPU加速处理模块包括视频分辨率调整模块，视频转码模块，视频裁剪模块，其中：Further, the teaching video GPU accelerated processing module includes a video resolution adjustment module, a video transcoding module, and a video cropping module, wherein:

所述视频分辨率调整模块对视频分辨率进行调整：通过解析HTTP POST请求中的请求参数，从而得到视频的目标分辨率，然后使用视频处理工具FFMPEG对指定的视频文件进行分辨率调整，在调整的过程中，充分挖掘GPU的并行计算能力，利用NVIDIA提供的NVENC、NVDEC加速视频编码解码，实现GPU加速视频处理，提高处理效率；The video resolution adjustment module adjusts the video resolution: by parsing the request parameters in the HTTP POST request, the target resolution of the video is obtained, and then the video processing tool FFMPEG is used to adjust the resolution of the specified video file. In the process, fully exploit the parallel computing capability of GPU, use NVENC and NVDEC provided by NVIDIA to accelerate video encoding and decoding, realize GPU-accelerated video processing, and improve processing efficiency;

所述视频转码模块对视频格式进行转换：通过解析HTTP POST请求中的请求参数，从而得到目标的视频格式，然后使用视频处理工具FFMPEG对指定的视频文件进行视频格式的转换，在转换的过程中，充分挖掘GPU的并行计算能力，利用NVIDIA提供的NVENC、NVDEC加速视频编码解码，实现GPU加速视频处理，提高处理效率；The video transcoding module converts the video format: by parsing the request parameters in the HTTP POST request, the target video format is obtained, and then the video processing tool FFMPEG is used to convert the specified video file to the video format. , fully exploit the parallel computing capability of GPU, and use NVENC and NVDEC provided by NVIDIA to accelerate video encoding and decoding to realize GPU-accelerated video processing and improve processing efficiency;

所述视频裁剪模块对视频进行裁剪：通过解析HTTP POST请求中的请求参数，从而得到需要裁剪的视频的起始时间和结束时间，然后使用视频处理工具FFMPEG对指定的视频文件进行裁剪。The video trimming module trims the video: by parsing the request parameters in the HTTP POST request, the start time and end time of the video to be trimmed are obtained, and then the video processing tool FFMPEG is used to trim the specified video file.

进一步，所述教学视频自动人眼打马赛克模块包括图像中人脸检测模块、人脸中人眼特征点检测模块、连续帧间人眼特征点追踪模块、人眼特征点区域打马赛克模块及图像帧和音频合成视频模块，其中：Further, the automatic human eye mosaicking module of the teaching video includes a face detection module in an image, a human eye feature point detection module in a human face, a human eye feature point tracking module between consecutive frames, a human eye feature point area mosaic module and an image. Frame and audio composite video module, where:

所述图像中人脸检测模块自动检测一张图像中的人脸区域的坐标点：预先训练一个检测人脸的基于深度学习的神经网络模型，保存神经网络的权重值到本地，在进行人脸检测时，从本地加载神经网络的权重值，当输入一张图像帧到模型中时，模型会自动检测出图像中人脸的坐标区域；The face detection module in the image automatically detects the coordinate points of the face area in an image: pre-train a neural network model based on deep learning to detect the face, save the weight value of the neural network locally, and perform face detection. During detection, the weight value of the neural network is loaded locally. When an image frame is input into the model, the model will automatically detect the coordinate area of the face in the image;

所述人脸中人眼特征点检测模块自动检测人脸中人眼特征点的坐标点：预先训练一个从人脸中检测人眼特征点的基于深度学习的神经网络模型，保存神经网络的权重值到本地，在进行人眼特征点检测时，从本地加载神经网络的权重值，当输入一张图像帧中的人脸区域到模型中时，模型会自动检测出人脸的人眼特征点；The eye feature point detection module in the human face automatically detects the coordinate points of the human eye feature points in the human face: pre-training a deep learning-based neural network model for detecting human eye feature points from the human face, and saving the weight of the neural network value to the local. When detecting human eye feature points, the weight value of the neural network is loaded from the local area. When inputting the face area in an image frame to the model, the model will automatically detect the human eye feature points of the human face. ;

所述连续帧间人眼特征点追踪模块对连续两帧图像的人眼特征点进行追踪：预先训练一个人眼特征点追踪的基于深度学习的神经网络模型，保存神经网络的权重值到本地，在进行人眼特征点追踪时，从本地加载神经网络的权重值，当输入下一帧的人脸区域到模型中时，追踪器先进行人眼特征点追踪，如果追踪的人眼特征点结果符合预期，则继续进行下一帧图像的处理，否则交给人眼特征点检测模块重新进行人眼特征点检测；The human eye feature point tracking module between consecutive frames tracks the human eye feature points of two consecutive frames of images: pre-training a deep learning-based neural network model for human eye feature point tracking, saving the weight value of the neural network locally, When tracking human eye feature points, the weight value of the neural network is loaded locally. When inputting the face area of the next frame into the model, the tracker first performs human eye feature point tracking. If the tracked human eye feature point results If it meets the expectations, continue to process the next frame of image, otherwise it will be handed over to the human eye feature point detection module for re-detection of human eye feature points;

所述人眼特征点区域打马赛克模块对人眼特征点区域进行打马赛克：在图像中检测出人眼特征点之后，根据人眼特征点计算对应的马赛克区域，然后生成相应的人眼马赛克；The human eye feature point region mosaicking module performs mosaicking on the human eye feature point region: after detecting the human eye feature point in the image, the corresponding mosaic region is calculated according to the human eye feature point, and then the corresponding human eye mosaic is generated;

所述图像帧和音频合成视频模块把打码后的图像帧和音频合成视频文件：得到打码后的图像帧之后，结合视频中之前分离出的音频，使用视频处理工具工具FFMPEG将图像帧和音频合成最终的打码后的视频。The image frame and the audio synthesizing video module synthesize the image frame and audio after the coding into a video file: after obtaining the image frame after the coding, in conjunction with the audio that was separated before in the video, use the video processing tool FFMPEG to combine the image frame and the audio. The audio is synthesized into the final encoded video.

本发明与现有技术相比，具有如下优点与有益效果：Compared with the prior art, the present invention has the following advantages and beneficial effects:

1、将教学视频处理任务由本地转移到服务器进行处理，服务器充分挖掘GPU的并行处理能力，并结合视频处理工具FFMPEG对视频转码、视频调整分辨率、视频裁剪进行加速处理，极大提高了视频处理速度，简化了视频处理流程，而且使用者不需要学习和使用第三方视频处理软件。1. Transfer the teaching video processing tasks from the local to the server for processing. The server fully exploits the parallel processing capability of the GPU, and combines the video processing tool FFMPEG to accelerate the video transcoding, video resolution adjustment, and video cropping, which greatly improves the performance. The video processing speed simplifies the video processing process, and users do not need to learn and use third-party video processing software.

2、基于最新的深度学习技术、神经网络技术和强大的GPU算力，通过在系统中部署人脸检测模型、人眼特征点检测模型和人眼特征点追踪模型，该系统提供了普通视频处理软件不具备的高阶人眼自动打马赛克功能。2. Based on the latest deep learning technology, neural network technology and powerful GPU computing power, the system provides ordinary video processing by deploying a face detection model, a human eye feature point detection model and a human eye feature point tracking model in the system. The software does not have the high-level human eye automatic mosaic function.

3、该视频处理系统采用分布式部署，教学视频存储服务器和教学视频处理服务器相分离，视频处理系统由多台视频处理服务器组成，通过Web服务器Nginx分发处理请求从而实现负载均衡。3. The video processing system adopts distributed deployment, and the teaching video storage server and the teaching video processing server are separated. The video processing system consists of multiple video processing servers, and the web server Nginx distributes and processes requests to achieve load balancing.

附图说明Description of drawings

图1为本发明系统整体架构示意图。FIG. 1 is a schematic diagram of the overall architecture of the system of the present invention.

图2为视频处理过程的整个流程图。Fig. 2 is the whole flow chart of the video processing process.

具体实施方式Detailed ways

下面结合具体实施例对本发明作进一步说明。The present invention will be further described below in conjunction with specific embodiments.

本实施例所提供的分布式教学视频处理系统，是使用Visual Studio Code软件和C++和Python语言开发的在服务器上运行的教学视频处理系统，如图1所示，系统通过Web服务器Nginx转发请求和负载均衡，将视频处理请求转发到Fast CGI服务器进行处理。该教学视频处理系统包括有：The distributed teaching video processing system provided in this embodiment is a teaching video processing system developed by using Visual Studio Code software and C++ and Python languages and running on a server. As shown in Figure 1, the system forwards requests and messages through the web server Nginx. Load balancing, forwards video processing requests to Fast CGI servers for processing. The teaching video processing system includes:

教学视频GPU加速处理模块，用于实现根据用户输入的视频名称和视频分辨率对视频进行调整分辨率，根据用户输入的视频名和视频格式对视频进行转码，根据用户输入的视频名和起始和结束时间对视频进行裁剪；The teaching video GPU accelerated processing module is used to adjust the resolution of the video according to the video name and video resolution input by the user, transcode the video according to the video name and video format input by the user, and use the video name and start and Crop the video at the end time;

教学视频自动人眼打马赛克模块，用于实现将教学视频分解为图像帧，检测图像中的人脸，检测人脸中人眼特征点，对人眼特征点区域进行打马赛克，把打码后的图像帧制作成视频。The teaching video automatic eye mosaic module is used to decompose the teaching video into image frames, detect the human face in the image, detect the human eye feature points in the face, and perform mosaic on the human eye feature point area. image frames into a video.

所述教学视频文件传输模块包括用户登录模块、Cookie处理模块、视频文件下载模块、视频文件上传模块，其中：The teaching video file transmission module includes a user login module, a cookie processing module, a video file download module, and a video file upload module, wherein:

所述教学视频GPU加速处理模块包括视频分辨率调整模块，视频转码模块，视频裁剪模块，其中：The teaching video GPU accelerated processing module includes a video resolution adjustment module, a video transcoding module, and a video cropping module, wherein:

所述视频分辨率调整模块对视频分辨率进行调整：通过解析HTTP POST请求中的请求参数，从而得到视频的目标分辨率，然后使用FFMPEG工具对指定的视频文件进行分辨率调整，在调整的过程中，充分挖掘GPU的并行计算能力，利用NVIDIA提供的NVENC、NVDEC加速视频编码解码，实现GPU加速视频处理，提高处理效率；The video resolution adjustment module adjusts the video resolution: by parsing the request parameters in the HTTP POST request, the target resolution of the video is obtained, and then the FFMPEG tool is used to adjust the resolution of the specified video file. , fully exploit the parallel computing capability of GPU, and use NVENC and NVDEC provided by NVIDIA to accelerate video encoding and decoding to realize GPU-accelerated video processing and improve processing efficiency;

所述视频转码模块对视频格式进行转换：通过解析HTTP POST请求中的请求参数，从而得到目标的视频格式，然后使用FFMPEG工具对指定的视频文件进行视频格式的转换，在转换的过程中，充分挖掘GPU的并行计算能力，利用NVIDIA提供的NVENC、NVDEC加速视频编码解码，实现GPU加速视频处理，提高处理效率；The video transcoding module converts the video format: by parsing the request parameters in the HTTP POST request, the video format of the target is obtained, and then the specified video file is used to convert the video format using the FFMPEG tool. In the process of conversion, Fully exploit the parallel computing capability of GPU, use NVENC and NVDEC provided by NVIDIA to accelerate video encoding and decoding, realize GPU-accelerated video processing, and improve processing efficiency;

所述视频裁剪模块对视频进行裁剪：通过解析HTTP POST请求中的请求参数，从而得到需要裁剪的视频的起始时间和结束时间，然后使用FFMPEG工具对指定的视频文件进行裁剪；The video cropping module cuts the video: by parsing the request parameter in the HTTP POST request, thereby obtaining the start time and the end time of the video that needs to be cropped, and then using the FFMPEG tool to crop the specified video file;

所述教学视频自动人眼打马赛克模块包括图像中人脸检测模块、人脸中人眼特征点检测模块、人眼特征点区域打马赛克模块及图像帧和音频合成视频模块，其中：The automatic human eye mosaicking module of the teaching video includes a face detection module in an image, a human eye feature point detection module in a human face, a human eye feature point region mosaicking module and an image frame and audio synthesis video module, wherein:

所述图像中人脸检测模块自动检测一张图像中的人脸区域的坐标点：预先训练一个检测人脸的基于深度学习的神经网络模型，该模型主要对FaceBoxes人脸检测器进行改进，改良其网络结构并增加网络深度和宽度，然后剔除系统中不需要的难辨样本进行训练。训练好模型之后，保存神经网络模型的权重值到本地。在进行人脸检测时，从本地加载神经网络的权重值，当输入一张图像帧到神经网络模型中时，该模型会自动检测出图像中人脸位置的坐标区域，包括人脸区域的左上角坐标值，人脸区域的右下角坐标值，以及人脸区域的中心坐标值。The face detection module in the image automatically detects the coordinate points of the face area in an image: pre-trains a deep learning-based neural network model for detecting faces, and the model mainly improves the FaceBoxes face detector. Its network structure and increase the depth and width of the network, and then remove the unneeded indistinguishable samples in the system for training. After training the model, save the weights of the neural network model locally. When performing face detection, load the weight value of the neural network locally. When an image frame is input into the neural network model, the model will automatically detect the coordinate area of the face position in the image, including the upper left of the face area. The corner coordinate value, the coordinate value of the lower right corner of the face area, and the center coordinate value of the face area.

所述人脸中人眼特征点检测模块自动检测人脸中人眼特征点的坐标点：预先训练一个从人脸中检测人眼特征点的基于深度学习的神经网络网络模型，该模型改进了MTCNN模型中的O-Net，并调整数据集进行训练。将模型训练好以后，保存神经网络的权重值到本地。在进行人眼特征点检测时，从本地加载该神经网络的权重值，当输入一张图像帧中的人脸区域到模型中时，模型会自动检测出人脸的人眼特征点；The eye feature point detection module in the face automatically detects the coordinate points of the human eye feature points in the face: pre-training a deep learning-based neural network model for detecting the human eye feature points from the face, the model improves O-Net in the MTCNN model, and adjust the dataset for training. After the model is trained, save the weights of the neural network locally. When detecting human eye feature points, the weight value of the neural network is loaded locally. When inputting the face area in an image frame into the model, the model will automatically detect the human eye feature points of the face;

所述人眼特征点追踪模块对连续两帧图像的人眼特征点进行追踪：预先训练一个人眼特征点追踪的基于深度学习神经网络模型，保存神经网络的权重值到本地。在进行人眼特征点追踪时，从本地加载神经网络的权重值，当输入下一帧的人脸区域到模型中时，追踪器先进行人眼特征点追踪，如果追踪的人眼特征点结果符合预期，则继续进行下一帧图像的处理，否则交给人眼特征点检测模块重新进行人眼特征点检测；The human eye feature point tracking module tracks the human eye feature points of two consecutive frames of images: pre-trains a deep learning neural network model based on human eye feature point tracking, and saves the weight value of the neural network locally. When tracking human eye feature points, the weight value of the neural network is loaded locally. When inputting the face area of the next frame into the model, the tracker first performs human eye feature point tracking. If the tracked human eye feature point results If it meets the expectations, continue to process the next frame of image, otherwise it will be handed over to the human eye feature point detection module for re-detection of human eye feature points;

所述人眼特征点区域打马赛克模块对人眼特征点区域进行打马赛克：在图像中检测出人眼特征点之后，根据人眼特征点计算马赛克区域，然后生成相应的马赛克；The human eye feature point region mosaicking module performs mosaicking on the human eye feature point region: after detecting the human eye feature point in the image, the mosaic region is calculated according to the human eye feature point, and then the corresponding mosaic is generated;

所述图像帧和音频合成视频模块把打码后的图像帧和音频合成视频文件：得到打码后的图像帧之后，结合视频中之前分离出的音频，使用FFMPEG工具将图像帧和音频合成最终的视频。The image frame and audio synthesis video module synthesizes the coded image frame and audio into a video file: after obtaining the coded image frame, combined with the audio previously separated in the video, the image frame and the audio are synthesized using FFMPEG tools. 's video.

本实施例上述分布式教学视频处理系统的整个处理流程如图2所述，首先从浏览器端上传视频到视频存储服务器，上传成功之后，浏览器端发出视频处理请求给视频处理系统，视频处理系统收到请求之后从视频存储服务器下载视频并进行处理，处理完成之后再把处理后的视频上传到视频存储服务器。The entire processing flow of the distributed teaching video processing system in this embodiment is shown in Figure 2. First, the video is uploaded from the browser to the video storage server. After the upload is successful, the browser sends a video processing request to the video processing system, and the video is processed. After receiving the request, the system downloads the video from the video storage server and processes it, and then uploads the processed video to the video storage server after the processing is completed.

以上所述实施例只为本发明之较佳实施例，并非以此限制本发明的实施范围，故凡依本发明之形状、原理所作的变化，均应涵盖在本发明的保护范围内。The above-mentioned embodiments are only preferred embodiments of the present invention, and are not intended to limit the scope of implementation of the present invention. Therefore, any changes made according to the shape and principle of the present invention should be included within the protection scope of the present invention.

Claims

1. A distributed teaching video processing system, comprising:

the teaching video file transmission module is used for downloading an original video file from the video storage server and uploading the video file processed by the video processing system;

the teaching video GPU acceleration processing module is used for adjusting the resolution of a video according to the video name and the video resolution input by a user, transcoding the video according to the video name and the video format input by the user, and cutting the video according to the video name, the starting time and the ending time input by the user;

the automatic human eye mosaic printing module for teaching video is used for decomposing teaching video into image frames, detecting human faces in the images, detecting human eye feature points in the human faces, printing mosaic on human eye feature point areas, and making the image frames and audio after printing into video.

2. A distributed instructional video processing system according to claim 1, wherein: the teaching video file transmission module comprises a user login module, a Cookie processing module, a video file downloading module and a video file uploading module, wherein:

the user login module logs in the video storage server according to a user name and a password input by a user;

the Cookie processing module is responsible for storing Cookies returned by the video storage server to the local and loading the Cookies from the local;

the video file downloading module is responsible for downloading video files from the video storage server;

and the video file uploading module is responsible for uploading the processed video to the video storage server.

3. A distributed instructional video processing system according to claim 1, wherein: the teaching video GPU acceleration processing module comprises a video resolution adjusting module, a video transcoding module and a video cutting module, wherein:

the video resolution adjusting module adjusts the video resolution: analyzing request parameters in the HTTP POST request to obtain a target resolution of a video, then using a video processing tool FFMPEG to adjust the resolution of a specified video file, fully mining the parallel computing capability of a GPU in the adjusting process, and accelerating video coding and decoding by using NVENC and NVDEC provided by NVIDIA to realize GPU accelerated video processing and improve the processing efficiency;

the video transcoding module converts the video format: analyzing request parameters in the HTTP POST request to obtain a target video format, then converting the video format of a specified video file by using a video processing tool FFMPEG, fully mining the parallel computing capability of a GPU in the conversion process, and accelerating video coding and decoding by using NVENC and NVDEC provided by NVIDIA to realize GPU accelerated video processing and improve the processing efficiency;

the video clipping module clips videos: the starting time and the ending time of the video needing to be cut are obtained by analyzing the request parameters in the HTTP POST request, and then the video processing tool FFMPEG is used for cutting the specified video file.

4. A distributed instructional video processing system according to claim 1, wherein: automatic people's eye of teaching video beats mosaic module includes in the image people's eye detection module, people's eye feature point detection module in the people's face, continuous interframe people's eye feature point tracking module, people's eye feature point region beat mosaic module and image frame and audio synthesis video module, wherein:

the in-image face detection module automatically detects coordinate points of a face area in one image: pre-training a neural network model for detecting a human face based on deep learning, storing a weight value of the neural network to a local area, loading the weight value of the neural network from the local area when detecting the human face, and automatically detecting a coordinate area of the human face in an image by the model when inputting an image frame into the model;

the module for detecting the characteristic points of the human eyes in the human face automatically detects the coordinate points of the characteristic points of the human eyes in the human face: pre-training a neural network model based on deep learning for detecting human eye feature points from a human face, storing the weight value of the neural network to the local, loading the weight value of the neural network from the local during human eye feature point detection, and automatically detecting the human eye feature points of the human face by the model when a human face region in an image frame is input into the model;

the continuous inter-frame human eye feature point tracking module tracks the human eye feature points of two continuous frames of images: pre-training a neural network model tracked by human eye feature points and based on deep learning, storing the weight value of the neural network to the local, loading the weight value of the neural network from the local during the tracking of the human eye feature points, when inputting the face area of the next frame into the model, tracking the human eye feature points by a tracker, if the tracked human eye feature point result is in line with expectation, continuing to process the image of the next frame, otherwise, handing the result to a human eye feature point detection module to detect the human eye feature points again;

the human eye characteristic point region mosaic printing module performs mosaic printing on the human eye characteristic point region: after human eye feature points are detected in the image, calculating a corresponding mosaic area according to the human eye feature points, and then generating a corresponding human eye mosaic;

the image frame and audio synthesis video module synthesizes the coded image frame and audio into a video file: after the coded image frames are obtained, the image frames and audio are combined into the final coded video using the video processing tool FFMPEG in combination with the previously separated audio in the video.