CN111327939A - Distributed teaching video processing system - Google Patents
Distributed teaching video processing system Download PDFInfo
- Publication number
- CN111327939A CN111327939A CN202010114831.7A CN202010114831A CN111327939A CN 111327939 A CN111327939 A CN 111327939A CN 202010114831 A CN202010114831 A CN 202010114831A CN 111327939 A CN111327939 A CN 111327939A
- Authority
- CN
- China
- Prior art keywords
- video
- module
- human eye
- eye feature
- teaching
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/431—Generation of visual interfaces for content selection or interaction; Content or additional data rendering
- H04N21/4312—Generation of visual interfaces for content selection or interaction; Content or additional data rendering involving specific graphical features, e.g. screen layout, special fonts or colors, blinking icons, highlights or animations
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/161—Detection; Localisation; Normalisation
- G06V40/165—Detection; Localisation; Normalisation using facial parts and geometric relationships
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/168—Feature extraction; Face representation
- G06V40/171—Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/18—Eye characteristics, e.g. of the iris
- G06V40/19—Sensors therefor
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/18—Eye characteristics, e.g. of the iris
- G06V40/193—Preprocessing; Feature extraction
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/21—Server components or server architectures
- H04N21/218—Source of audio or video content, e.g. local disk arrays
- H04N21/2181—Source of audio or video content, e.g. local disk arrays comprising remotely distributed storage units, e.g. when movies are replicated over a plurality of video servers
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
- H04N21/4402—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
- H04N21/440236—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display by media transcoding, e.g. video is transformed into a slideshow of still pictures, audio is converted into text
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- Multimedia (AREA)
- General Health & Medical Sciences (AREA)
- Theoretical Computer Science (AREA)
- Oral & Maxillofacial Surgery (AREA)
- General Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Signal Processing (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Ophthalmology & Optometry (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Geometry (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
Abstract
本发明公开了一种分布式教学视频处理系统,包括教学视频文件传输模块、教学视频GPU加速处理模块和教学视频自动人眼打马赛克模块,每一个模块均可由C++语言实现对应的软件模块,从而实现可在服务器上运行的教学视频处理系统。教学视频文件传输模块用于实现从视频存储服务器中下载原始视频文件和上传处理后的视频文件。教学视频GPU加速处理模块可以对视频裁剪、视频转码、视频调整分辨率进行GPU加速处理。教学视频自动人眼打马赛克模块可以实现对视频中的人眼进行自动打马赛克。本发明将本地视频处理转移到服务器中进行处理,加快了视频处理速度,且提供视频处理软件不具备的视频自动人眼打马赛克功能,简化和方便了视频处理流程。
The invention discloses a distributed teaching video processing system, comprising a teaching video file transmission module, a teaching video GPU accelerated processing module and a teaching video automatic human eye mosaicing module, each module can be implemented by C++ language corresponding software module, thereby Implement a teaching video processing system that can run on a server. The teaching video file transfer module is used to download the original video file from the video storage server and upload the processed video file. The GPU-accelerated processing module for teaching videos can perform GPU-accelerated processing for video cropping, video transcoding, and video resolution adjustment. The teaching video automatic human eye mosaic module can realize automatic mosaic of the human eyes in the video. The present invention transfers the local video processing to the server for processing, which speeds up the video processing speed, and provides the video automatic human eye mosaic function that the video processing software does not have, which simplifies and facilitates the video processing flow.
Description
技术领域technical field
本发明涉及视频处理的技术领域,尤其是指一种分布式教学视频处理系统。The invention relates to the technical field of video processing, in particular to a distributed teaching video processing system.
背景技术Background technique
目前,在学习管理系统中,我们通常需要拍摄一些老师讲课的教学视频然后制作成网络课程供学生学习。然而由于拍摄的原始教学视频通常尺寸和分辨率一般都非常大,为了节省学习管理系统中的带宽资源,我们通常需要对这些视频进行一些处理,比如对原始视频进行裁剪,或者调整其分辨率缩小视频大小,或者对视频进行转码等一系列操作。此外,针对一些中医诊断的教学视频,我们还需要对视频进行一些高阶处理,如对人眼打马赛克进而对病人的隐私进行保护。At present, in the learning management system, we usually need to shoot some teaching videos of teachers' lectures and then make them into online courses for students to learn. However, since the original teaching videos shot are usually very large in size and resolution, in order to save bandwidth resources in the learning management system, we usually need to perform some processing on these videos, such as cropping the original video, or adjusting its resolution to reduce Video size, or a series of operations such as transcoding the video. In addition, for some teaching videos of TCM diagnosis, we also need to perform some high-level processing on the videos, such as mosaicking the human eyes to protect the privacy of patients.
面对这些视频处理任务,我们通常会使用一些第三方视频处理软件,比如AdobePremiere等进行处理。然而对于教师而言,这些视频处理任务一般是非常的繁琐和重复的,而且一般这些视频处理软件通常是面向专业者,且功能比较基础,欠缺高阶功能,对一般的使用者不友好,此外,使用者利用自己个人电脑处理这些任务时,严重依赖单机CPU资源,处理效率非常低下。Faced with these video processing tasks, we usually use some third-party video processing software, such as Adobe Premiere, for processing. However, for teachers, these video processing tasks are generally very cumbersome and repetitive, and generally these video processing software are usually oriented to professionals, and their functions are relatively basic, lacking high-level functions, and are not friendly to ordinary users. , when users use their personal computers to process these tasks, they rely heavily on single-machine CPU resources, and the processing efficiency is very low.
发明内容SUMMARY OF THE INVENTION
本发明的目的在于克服现有技术的缺点与不足,提出了一种分布式教学视频处理系统,该系统充分挖掘了GPU的并行处理能力进行教学视频处理,解决了用户本地教学视频处理速度慢,用户本地处理操作繁琐,需要依赖第三方视频处理软件以及提供了第三方视频处理软件不具备的一些视频处理功能。该系统充当一个教学视频处理引擎的角色,一方面可以对传统教学视频处理,比如视频转码、分辨率、裁剪进行GPU加速,另一方面,这个引擎提供了一些高阶的教学视频处理功能,如教学视频中的人眼自动打马赛克等。The purpose of the present invention is to overcome the shortcomings and deficiencies of the prior art, and proposes a distributed teaching video processing system, which fully exploits the parallel processing capability of the GPU for teaching video processing, and solves the problem that the user's local teaching video processing speed is slow. The user's local processing operation is cumbersome, and it needs to rely on third-party video processing software and provide some video processing functions that the third-party video processing software does not have. The system acts as a teaching video processing engine. On the one hand, it can perform GPU acceleration on traditional teaching video processing, such as video transcoding, resolution, and cropping. On the other hand, this engine provides some high-level teaching video processing functions. For example, the human eye in the teaching video automatically plays mosaic and so on.
为实现上述目的,本发明所提供的技术方案如下:一种分布式教学视频处理系统,包括:To achieve the above purpose, the technical solution provided by the present invention is as follows: a distributed teaching video processing system, comprising:
教学视频文件传输模块,用于实现从视频存储服务器中下载原始视频文件,并上传视频处理系统处理后的视频文件;The teaching video file transmission module is used to download the original video file from the video storage server and upload the video file processed by the video processing system;
教学视频GPU加速处理模块,用于实现根据用户输入的视频名称和视频分辨率对视频进行调整分辨率,根据用户输入的视频名和视频格式对视频进行转码,根据用户输入的视频名、起始时间和结束时间对视频进行裁剪;The teaching video GPU accelerated processing module is used to adjust the resolution of the video according to the video name and video resolution input by the user, and transcode the video according to the video name and video format input by the user. Time and end time to crop the video;
教学视频自动人眼打马赛克模块,用于实现将教学视频分解为图像帧,检测图像中的人脸,检测人脸中人眼特征点,对人眼特征点区域进行打马赛克,把打码后的图像帧和音频制作成视频。The teaching video automatic eye mosaic module is used to decompose the teaching video into image frames, detect the human face in the image, detect the human eye feature points in the face, and perform mosaic on the human eye feature point area. image frames and audio into a video.
进一步,所述教学视频文件传输模块包括用户登录模块、Cookie处理模块、视频文件下载模块、视频文件上传模块,其中:Further, the teaching video file transmission module includes a user login module, a cookie processing module, a video file download module, and a video file upload module, wherein:
所述用户登录模块根据用户输入的用户名和密码登录视频存储服务器;The user login module logs in to the video storage server according to the user name and password input by the user;
所述Cookie处理模块负责保存视频存储服务器返回的Cookie到本地以及从本地加载Cookie;The cookie processing module is responsible for saving the cookie returned by the video storage server to the local and loading the cookie from the local;
所述视频文件下载模块负责从视频存储服务器中下载视频文件;The video file downloading module is responsible for downloading video files from the video storage server;
所述视频文件上传模块负责上传处理后的视频到视频存储服务器。The video file uploading module is responsible for uploading the processed video to the video storage server.
进一步,所述教学视频GPU加速处理模块包括视频分辨率调整模块,视频转码模块,视频裁剪模块,其中:Further, the teaching video GPU accelerated processing module includes a video resolution adjustment module, a video transcoding module, and a video cropping module, wherein:
所述视频分辨率调整模块对视频分辨率进行调整:通过解析HTTP POST请求中的请求参数,从而得到视频的目标分辨率,然后使用视频处理工具FFMPEG对指定的视频文件进行分辨率调整,在调整的过程中,充分挖掘GPU的并行计算能力,利用NVIDIA提供的NVENC、NVDEC加速视频编码解码,实现GPU加速视频处理,提高处理效率;The video resolution adjustment module adjusts the video resolution: by parsing the request parameters in the HTTP POST request, the target resolution of the video is obtained, and then the video processing tool FFMPEG is used to adjust the resolution of the specified video file. In the process, fully exploit the parallel computing capability of GPU, use NVENC and NVDEC provided by NVIDIA to accelerate video encoding and decoding, realize GPU-accelerated video processing, and improve processing efficiency;
所述视频转码模块对视频格式进行转换:通过解析HTTP POST请求中的请求参数,从而得到目标的视频格式,然后使用视频处理工具FFMPEG对指定的视频文件进行视频格式的转换,在转换的过程中,充分挖掘GPU的并行计算能力,利用NVIDIA提供的NVENC、NVDEC加速视频编码解码,实现GPU加速视频处理,提高处理效率;The video transcoding module converts the video format: by parsing the request parameters in the HTTP POST request, the target video format is obtained, and then the video processing tool FFMPEG is used to convert the specified video file to the video format. , fully exploit the parallel computing capability of GPU, and use NVENC and NVDEC provided by NVIDIA to accelerate video encoding and decoding to realize GPU-accelerated video processing and improve processing efficiency;
所述视频裁剪模块对视频进行裁剪:通过解析HTTP POST请求中的请求参数,从而得到需要裁剪的视频的起始时间和结束时间,然后使用视频处理工具FFMPEG对指定的视频文件进行裁剪。The video trimming module trims the video: by parsing the request parameters in the HTTP POST request, the start time and end time of the video to be trimmed are obtained, and then the video processing tool FFMPEG is used to trim the specified video file.
进一步,所述教学视频自动人眼打马赛克模块包括图像中人脸检测模块、人脸中人眼特征点检测模块、连续帧间人眼特征点追踪模块、人眼特征点区域打马赛克模块及图像帧和音频合成视频模块,其中:Further, the automatic human eye mosaicking module of the teaching video includes a face detection module in an image, a human eye feature point detection module in a human face, a human eye feature point tracking module between consecutive frames, a human eye feature point area mosaic module and an image. Frame and audio composite video module, where:
所述图像中人脸检测模块自动检测一张图像中的人脸区域的坐标点:预先训练一个检测人脸的基于深度学习的神经网络模型,保存神经网络的权重值到本地,在进行人脸检测时,从本地加载神经网络的权重值,当输入一张图像帧到模型中时,模型会自动检测出图像中人脸的坐标区域;The face detection module in the image automatically detects the coordinate points of the face area in an image: pre-train a neural network model based on deep learning to detect the face, save the weight value of the neural network locally, and perform face detection. During detection, the weight value of the neural network is loaded locally. When an image frame is input into the model, the model will automatically detect the coordinate area of the face in the image;
所述人脸中人眼特征点检测模块自动检测人脸中人眼特征点的坐标点:预先训练一个从人脸中检测人眼特征点的基于深度学习的神经网络模型,保存神经网络的权重值到本地,在进行人眼特征点检测时,从本地加载神经网络的权重值,当输入一张图像帧中的人脸区域到模型中时,模型会自动检测出人脸的人眼特征点;The eye feature point detection module in the human face automatically detects the coordinate points of the human eye feature points in the human face: pre-training a deep learning-based neural network model for detecting human eye feature points from the human face, and saving the weight of the neural network value to the local. When detecting human eye feature points, the weight value of the neural network is loaded from the local area. When inputting the face area in an image frame to the model, the model will automatically detect the human eye feature points of the human face. ;
所述连续帧间人眼特征点追踪模块对连续两帧图像的人眼特征点进行追踪:预先训练一个人眼特征点追踪的基于深度学习的神经网络模型,保存神经网络的权重值到本地,在进行人眼特征点追踪时,从本地加载神经网络的权重值,当输入下一帧的人脸区域到模型中时,追踪器先进行人眼特征点追踪,如果追踪的人眼特征点结果符合预期,则继续进行下一帧图像的处理,否则交给人眼特征点检测模块重新进行人眼特征点检测;The human eye feature point tracking module between consecutive frames tracks the human eye feature points of two consecutive frames of images: pre-training a deep learning-based neural network model for human eye feature point tracking, saving the weight value of the neural network locally, When tracking human eye feature points, the weight value of the neural network is loaded locally. When inputting the face area of the next frame into the model, the tracker first performs human eye feature point tracking. If the tracked human eye feature point results If it meets the expectations, continue to process the next frame of image, otherwise it will be handed over to the human eye feature point detection module for re-detection of human eye feature points;
所述人眼特征点区域打马赛克模块对人眼特征点区域进行打马赛克:在图像中检测出人眼特征点之后,根据人眼特征点计算对应的马赛克区域,然后生成相应的人眼马赛克;The human eye feature point region mosaicking module performs mosaicking on the human eye feature point region: after detecting the human eye feature point in the image, the corresponding mosaic region is calculated according to the human eye feature point, and then the corresponding human eye mosaic is generated;
所述图像帧和音频合成视频模块把打码后的图像帧和音频合成视频文件:得到打码后的图像帧之后,结合视频中之前分离出的音频,使用视频处理工具工具FFMPEG将图像帧和音频合成最终的打码后的视频。The image frame and the audio synthesizing video module synthesize the image frame and audio after the coding into a video file: after obtaining the image frame after the coding, in conjunction with the audio that was separated before in the video, use the video processing tool FFMPEG to combine the image frame and the audio. The audio is synthesized into the final encoded video.
本发明与现有技术相比,具有如下优点与有益效果:Compared with the prior art, the present invention has the following advantages and beneficial effects:
1、将教学视频处理任务由本地转移到服务器进行处理,服务器充分挖掘GPU的并行处理能力,并结合视频处理工具FFMPEG对视频转码、视频调整分辨率、视频裁剪进行加速处理,极大提高了视频处理速度,简化了视频处理流程,而且使用者不需要学习和使用第三方视频处理软件。1. Transfer the teaching video processing tasks from the local to the server for processing. The server fully exploits the parallel processing capability of the GPU, and combines the video processing tool FFMPEG to accelerate the video transcoding, video resolution adjustment, and video cropping, which greatly improves the performance. The video processing speed simplifies the video processing process, and users do not need to learn and use third-party video processing software.
2、基于最新的深度学习技术、神经网络技术和强大的GPU算力,通过在系统中部署人脸检测模型、人眼特征点检测模型和人眼特征点追踪模型,该系统提供了普通视频处理软件不具备的高阶人眼自动打马赛克功能。2. Based on the latest deep learning technology, neural network technology and powerful GPU computing power, the system provides ordinary video processing by deploying a face detection model, a human eye feature point detection model and a human eye feature point tracking model in the system. The software does not have the high-level human eye automatic mosaic function.
3、该视频处理系统采用分布式部署,教学视频存储服务器和教学视频处理服务器相分离,视频处理系统由多台视频处理服务器组成,通过Web服务器Nginx分发处理请求从而实现负载均衡。3. The video processing system adopts distributed deployment, and the teaching video storage server and the teaching video processing server are separated. The video processing system consists of multiple video processing servers, and the web server Nginx distributes and processes requests to achieve load balancing.
附图说明Description of drawings
图1为本发明系统整体架构示意图。FIG. 1 is a schematic diagram of the overall architecture of the system of the present invention.
图2为视频处理过程的整个流程图。Fig. 2 is the whole flow chart of the video processing process.
具体实施方式Detailed ways
下面结合具体实施例对本发明作进一步说明。The present invention will be further described below in conjunction with specific embodiments.
本实施例所提供的分布式教学视频处理系统,是使用Visual Studio Code软件和C++和Python语言开发的在服务器上运行的教学视频处理系统,如图1所示,系统通过Web服务器Nginx转发请求和负载均衡,将视频处理请求转发到Fast CGI服务器进行处理。该教学视频处理系统包括有:The distributed teaching video processing system provided in this embodiment is a teaching video processing system developed by using Visual Studio Code software and C++ and Python languages and running on a server. As shown in Figure 1, the system forwards requests and messages through the web server Nginx. Load balancing, forwards video processing requests to Fast CGI servers for processing. The teaching video processing system includes:
教学视频文件传输模块,用于实现从视频存储服务器中下载原始视频文件,并上传视频处理系统处理后的视频文件;The teaching video file transmission module is used to download the original video file from the video storage server and upload the video file processed by the video processing system;
教学视频GPU加速处理模块,用于实现根据用户输入的视频名称和视频分辨率对视频进行调整分辨率,根据用户输入的视频名和视频格式对视频进行转码,根据用户输入的视频名和起始和结束时间对视频进行裁剪;The teaching video GPU accelerated processing module is used to adjust the resolution of the video according to the video name and video resolution input by the user, transcode the video according to the video name and video format input by the user, and use the video name and start and Crop the video at the end time;
教学视频自动人眼打马赛克模块,用于实现将教学视频分解为图像帧,检测图像中的人脸,检测人脸中人眼特征点,对人眼特征点区域进行打马赛克,把打码后的图像帧制作成视频。The teaching video automatic eye mosaic module is used to decompose the teaching video into image frames, detect the human face in the image, detect the human eye feature points in the face, and perform mosaic on the human eye feature point area. image frames into a video.
所述教学视频文件传输模块包括用户登录模块、Cookie处理模块、视频文件下载模块、视频文件上传模块,其中:The teaching video file transmission module includes a user login module, a cookie processing module, a video file download module, and a video file upload module, wherein:
所述用户登录模块根据用户输入的用户名和密码登录视频存储服务器;The user login module logs in to the video storage server according to the user name and password input by the user;
所述Cookie处理模块负责保存视频存储服务器返回的Cookie到本地以及从本地加载Cookie;The cookie processing module is responsible for saving the cookie returned by the video storage server to the local and loading the cookie from the local;
所述视频文件下载模块负责从视频存储服务器中下载视频文件;The video file downloading module is responsible for downloading video files from the video storage server;
所述视频文件上传模块负责上传处理后的视频到视频存储服务器。The video file uploading module is responsible for uploading the processed video to the video storage server.
所述教学视频GPU加速处理模块包括视频分辨率调整模块,视频转码模块,视频裁剪模块,其中:The teaching video GPU accelerated processing module includes a video resolution adjustment module, a video transcoding module, and a video cropping module, wherein:
所述视频分辨率调整模块对视频分辨率进行调整:通过解析HTTP POST请求中的请求参数,从而得到视频的目标分辨率,然后使用FFMPEG工具对指定的视频文件进行分辨率调整,在调整的过程中,充分挖掘GPU的并行计算能力,利用NVIDIA提供的NVENC、NVDEC加速视频编码解码,实现GPU加速视频处理,提高处理效率;The video resolution adjustment module adjusts the video resolution: by parsing the request parameters in the HTTP POST request, the target resolution of the video is obtained, and then the FFMPEG tool is used to adjust the resolution of the specified video file. , fully exploit the parallel computing capability of GPU, and use NVENC and NVDEC provided by NVIDIA to accelerate video encoding and decoding to realize GPU-accelerated video processing and improve processing efficiency;
所述视频转码模块对视频格式进行转换:通过解析HTTP POST请求中的请求参数,从而得到目标的视频格式,然后使用FFMPEG工具对指定的视频文件进行视频格式的转换,在转换的过程中,充分挖掘GPU的并行计算能力,利用NVIDIA提供的NVENC、NVDEC加速视频编码解码,实现GPU加速视频处理,提高处理效率;The video transcoding module converts the video format: by parsing the request parameters in the HTTP POST request, the video format of the target is obtained, and then the specified video file is used to convert the video format using the FFMPEG tool. In the process of conversion, Fully exploit the parallel computing capability of GPU, use NVENC and NVDEC provided by NVIDIA to accelerate video encoding and decoding, realize GPU-accelerated video processing, and improve processing efficiency;
所述视频裁剪模块对视频进行裁剪:通过解析HTTP POST请求中的请求参数,从而得到需要裁剪的视频的起始时间和结束时间,然后使用FFMPEG工具对指定的视频文件进行裁剪;The video cropping module cuts the video: by parsing the request parameter in the HTTP POST request, thereby obtaining the start time and the end time of the video that needs to be cropped, and then using the FFMPEG tool to crop the specified video file;
所述教学视频自动人眼打马赛克模块包括图像中人脸检测模块、人脸中人眼特征点检测模块、人眼特征点区域打马赛克模块及图像帧和音频合成视频模块,其中:The automatic human eye mosaicking module of the teaching video includes a face detection module in an image, a human eye feature point detection module in a human face, a human eye feature point region mosaicking module and an image frame and audio synthesis video module, wherein:
所述图像中人脸检测模块自动检测一张图像中的人脸区域的坐标点:预先训练一个检测人脸的基于深度学习的神经网络模型,该模型主要对FaceBoxes人脸检测器进行改进,改良其网络结构并增加网络深度和宽度,然后剔除系统中不需要的难辨样本进行训练。训练好模型之后,保存神经网络模型的权重值到本地。在进行人脸检测时,从本地加载神经网络的权重值,当输入一张图像帧到神经网络模型中时,该模型会自动检测出图像中人脸位置的坐标区域,包括人脸区域的左上角坐标值,人脸区域的右下角坐标值,以及人脸区域的中心坐标值。The face detection module in the image automatically detects the coordinate points of the face area in an image: pre-trains a deep learning-based neural network model for detecting faces, and the model mainly improves the FaceBoxes face detector. Its network structure and increase the depth and width of the network, and then remove the unneeded indistinguishable samples in the system for training. After training the model, save the weights of the neural network model locally. When performing face detection, load the weight value of the neural network locally. When an image frame is input into the neural network model, the model will automatically detect the coordinate area of the face position in the image, including the upper left of the face area. The corner coordinate value, the coordinate value of the lower right corner of the face area, and the center coordinate value of the face area.
所述人脸中人眼特征点检测模块自动检测人脸中人眼特征点的坐标点:预先训练一个从人脸中检测人眼特征点的基于深度学习的神经网络网络模型,该模型改进了MTCNN模型中的O-Net,并调整数据集进行训练。将模型训练好以后,保存神经网络的权重值到本地。在进行人眼特征点检测时,从本地加载该神经网络的权重值,当输入一张图像帧中的人脸区域到模型中时,模型会自动检测出人脸的人眼特征点;The eye feature point detection module in the face automatically detects the coordinate points of the human eye feature points in the face: pre-training a deep learning-based neural network model for detecting the human eye feature points from the face, the model improves O-Net in the MTCNN model, and adjust the dataset for training. After the model is trained, save the weights of the neural network locally. When detecting human eye feature points, the weight value of the neural network is loaded locally. When inputting the face area in an image frame into the model, the model will automatically detect the human eye feature points of the face;
所述人眼特征点追踪模块对连续两帧图像的人眼特征点进行追踪:预先训练一个人眼特征点追踪的基于深度学习神经网络模型,保存神经网络的权重值到本地。在进行人眼特征点追踪时,从本地加载神经网络的权重值,当输入下一帧的人脸区域到模型中时,追踪器先进行人眼特征点追踪,如果追踪的人眼特征点结果符合预期,则继续进行下一帧图像的处理,否则交给人眼特征点检测模块重新进行人眼特征点检测;The human eye feature point tracking module tracks the human eye feature points of two consecutive frames of images: pre-trains a deep learning neural network model based on human eye feature point tracking, and saves the weight value of the neural network locally. When tracking human eye feature points, the weight value of the neural network is loaded locally. When inputting the face area of the next frame into the model, the tracker first performs human eye feature point tracking. If the tracked human eye feature point results If it meets the expectations, continue to process the next frame of image, otherwise it will be handed over to the human eye feature point detection module for re-detection of human eye feature points;
所述人眼特征点区域打马赛克模块对人眼特征点区域进行打马赛克:在图像中检测出人眼特征点之后,根据人眼特征点计算马赛克区域,然后生成相应的马赛克;The human eye feature point region mosaicking module performs mosaicking on the human eye feature point region: after detecting the human eye feature point in the image, the mosaic region is calculated according to the human eye feature point, and then the corresponding mosaic is generated;
所述图像帧和音频合成视频模块把打码后的图像帧和音频合成视频文件:得到打码后的图像帧之后,结合视频中之前分离出的音频,使用FFMPEG工具将图像帧和音频合成最终的视频。The image frame and audio synthesis video module synthesizes the coded image frame and audio into a video file: after obtaining the coded image frame, combined with the audio previously separated in the video, the image frame and the audio are synthesized using FFMPEG tools. 's video.
本实施例上述分布式教学视频处理系统的整个处理流程如图2所述,首先从浏览器端上传视频到视频存储服务器,上传成功之后,浏览器端发出视频处理请求给视频处理系统,视频处理系统收到请求之后从视频存储服务器下载视频并进行处理,处理完成之后再把处理后的视频上传到视频存储服务器。The entire processing flow of the distributed teaching video processing system in this embodiment is shown in Figure 2. First, the video is uploaded from the browser to the video storage server. After the upload is successful, the browser sends a video processing request to the video processing system, and the video is processed. After receiving the request, the system downloads the video from the video storage server and processes it, and then uploads the processed video to the video storage server after the processing is completed.
以上所述实施例只为本发明之较佳实施例,并非以此限制本发明的实施范围,故凡依本发明之形状、原理所作的变化,均应涵盖在本发明的保护范围内。The above-mentioned embodiments are only preferred embodiments of the present invention, and are not intended to limit the scope of implementation of the present invention. Therefore, any changes made according to the shape and principle of the present invention should be included within the protection scope of the present invention.
Claims (4)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202010114831.7A CN111327939A (en) | 2020-02-25 | 2020-02-25 | Distributed teaching video processing system |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202010114831.7A CN111327939A (en) | 2020-02-25 | 2020-02-25 | Distributed teaching video processing system |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN111327939A true CN111327939A (en) | 2020-06-23 |
Family
ID=71171165
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202010114831.7A Pending CN111327939A (en) | 2020-02-25 | 2020-02-25 | Distributed teaching video processing system |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN111327939A (en) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN111738769A (en) * | 2020-06-24 | 2020-10-02 | 湖南快乐阳光互动娱乐传媒有限公司 | Video processing method and device |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20060104366A1 (en) * | 2004-11-16 | 2006-05-18 | Ming-Yen Huang | MPEG-4 streaming system with adaptive error concealment |
| CN101420452A (en) * | 2008-12-05 | 2009-04-29 | 深圳市迅雷网络技术有限公司 | Video file publishing method and device |
| CN109743579A (en) * | 2018-12-24 | 2019-05-10 | 秒针信息技术有限公司 | A kind of method for processing video frequency and device, storage medium and processor |
| CN110418144A (en) * | 2019-08-28 | 2019-11-05 | 成都索贝数码科技股份有限公司 | A method for transcoding multi-bit-rate video files based on NVIDIA GPU |
-
2020
- 2020-02-25 CN CN202010114831.7A patent/CN111327939A/en active Pending
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20060104366A1 (en) * | 2004-11-16 | 2006-05-18 | Ming-Yen Huang | MPEG-4 streaming system with adaptive error concealment |
| CN101420452A (en) * | 2008-12-05 | 2009-04-29 | 深圳市迅雷网络技术有限公司 | Video file publishing method and device |
| CN109743579A (en) * | 2018-12-24 | 2019-05-10 | 秒针信息技术有限公司 | A kind of method for processing video frequency and device, storage medium and processor |
| CN110418144A (en) * | 2019-08-28 | 2019-11-05 | 成都索贝数码科技股份有限公司 | A method for transcoding multi-bit-rate video files based on NVIDIA GPU |
Non-Patent Citations (1)
| Title |
|---|
| 吴家贤: "基于卷积神经网络的自动人眼马赛克系统", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN111738769A (en) * | 2020-06-24 | 2020-10-02 | 湖南快乐阳光互动娱乐传媒有限公司 | Video processing method and device |
| CN111738769B (en) * | 2020-06-24 | 2024-02-20 | 湖南快乐阳光互动娱乐传媒有限公司 | Video processing method and device |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US11941883B2 (en) | Video classification method, model training method, device, and storage medium | |
| US10938725B2 (en) | Load balancing multimedia conferencing system, device, and methods | |
| CN113255479A (en) | Lightweight human body posture recognition model training method, action segmentation method and device | |
| KR102358464B1 (en) | 3d image converter that automaically generates 3d character animation from image infomation using artificial intelligence and 3d image converting system including the same | |
| US20220076374A1 (en) | Few-shot Image Generation Via Self-Adaptation | |
| KR20210118437A (en) | Image display selectively depicting motion | |
| CN118780984A (en) | Image super-resolution reconstruction method and system based on convolution and Transformer hybrid architecture | |
| CN113920010A (en) | Super-resolution implementation method and device for image frame | |
| KR20210109244A (en) | Device and Method for Image Style Transfer | |
| US11463656B1 (en) | System and method for received video performance optimizations during a video conference session | |
| CN118887592B (en) | Modality missing RGBT tracking method and system based on missing perception prompts | |
| WO2024253577A1 (en) | Video generation method | |
| CN111327939A (en) | Distributed teaching video processing system | |
| CN120321436A (en) | Video stream transmission method, device and storage medium | |
| JP6843409B1 (en) | Learning method, content playback device, and content playback system | |
| CN118317125A (en) | Real-time intelligent video analysis system, method and terminal based on edge computing | |
| CN118799460A (en) | Video generation method, device, equipment and medium | |
| CN116309690B (en) | Target tracking method compatible with local and global view angles based on Transformer structure | |
| JP2015191358A (en) | Central person determination system, information terminal to be used by central person determination system, central person determination method, central person determination program, and recording medium | |
| US12323571B2 (en) | Method of training a neural network configured for converting 2D images into 3D models | |
| CN110602405A (en) | Shooting method and device | |
| US12517584B2 (en) | Removing eye blinks from EMG speech signals | |
| US12301785B1 (en) | Selective data encoding and machine learning video synthesis for content streaming systems and applications | |
| CN117294905A (en) | Method and device for accelerating response speed of remote digital person | |
| CN118661212A (en) | Method and apparatus for deep learning of foot contact and forces |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20200623 |
|
| WD01 | Invention patent application deemed withdrawn after publication |